Python从文件中读取并保存到utf-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19591458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:07:51  来源:igfitidea点击:

Python reading from a file and saving to utf-8

pythonpython-2.7utf-8

提问by aarelovich

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

我在读取文件、处理其字符串和保存到 UTF-8 文件时遇到问题。

Here is the code:

这是代码:

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

I then do some processing on the variable text.

然后我对变量文本进行一些处理。

And then

进而

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

这完美地输出了文件,但根据我的编辑器,它在iso 8859-15中这样做。由于同一个编辑器将输入文件(在变量文件名中)识别为 UTF-8,我不知道为什么会发生这种情况。据我的研究表明,注释行应该可以解决问题。但是,当我使用这些行时,生成的文件主要是特殊字符中的乱码,带有波浪号的单词因为文本是西班牙语。我真的很感激任何帮助,因为我很难过......

采纳答案by Mark Tolonen

Process text to and from Unicode at the I/O boundaries of your program using the codecsmodule:

使用codecs模块在程序的 I/O 边界处处理与 Unicode 之间的文本:

import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

Edit:The iomodule is now recommended instead of codecs and is compatible with Python 3's opensyntax, and if using Python 3, you can just use openif you don't require Python 2 compatibility.

编辑:io现在推荐使用该模块而不是编解码器,并且与 Python 3 的open语法兼容,如果使用 Python 3,则可以在open不需要 Python 2 兼容性的情况下使用。

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

回答by Fernando Freitas Alves

You can't do that using open. use codecs.

你不能用 open 来做到这一点。使用编解码器。

when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:

当您使用 open 内置函数在 python 中打开文件时,您将始终以 ascii 读取/写入文件。用 utf-8 写它试试这个:

import codecs
file = codecs.open('data.txt','w','utf-8')

回答by Siva Kumar

You can also get through it by the code below:

您也可以通过下面的代码来完成它:

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()