写入 .txt 文件 (UTF-8),python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4112894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:19:27  来源:igfitidea点击:

Writing to a .txt file (UTF-8), python

pythonsave

提问by Gusto

I want to save the output (contents) to a file (saving it in UTF-8). The file shouldn't be overwritten, it should be saved as a new file - e.g. file2.txtSo, I fists open a file.txt, encode it in UTF-8, do some stuff and then wanna save it to file2.txtin UTF-8. How do I do this?

我想将输出 ( contents)保存到文件中(以 UTF-8 格式保存)。该文件不应该被覆盖,它应该被保存为一个新文件 - 例如file2.txt,我拳头打开一个file.txt,用 UTF-8 编码,做一些事情,然后想把它保存到file2.txtUTF-8。我该怎么做呢?

import codecs
def openfile(filename):
    with codecs.open(filename, encoding="UTF-8") as F:
        contents = F.read()
        ...

采纳答案by adamk

The short way:

简短的方法:

file('file2.txt','w').write( file('file.txt').read().encode('utf-8') )

The long way:

路漫漫其修远兮:

data = file('file.txt').read()
... process data ...
data = data.encode('utf-8')
file('file2.txt','w').write( data )

And using 'codecs' explicitly:

并明确使用“编解码器”:

codecs.getwriter('utf-8')(file('/tmp/bla3','w')).write(data)

回答by Ignacio Vazquez-Abrams

Open a second file. Use contextlib.nested()if need be. Use shutil.copyfileobj()to copy the contents.

打开第二个文件。使用contextlib.nested(),如果需要的话。使用shutil.copyfileobj()复制的内容。

回答by Ignacio Vazquez-Abrams

I like to separate concerns in situations like this - I think it really makes the code cleaner, easier to maintain, and can be more efficient.

我喜欢在这种情况下分离关注点 - 我认为这确实使代码更清晰,更易于维护,并且可以更高效。

Here you've 3 concerns: reading a UTF-8 file, processing the lines, and writing a UTF-8 file. Assuming your processing is line-based, this works perfectly in Python, since opening and iterating over lines of a file is built in to the language. As well as being clearer, this is more efficient too since it allows you process huge files that don't fit into memory. Finally, it gives you a great way to test your code - because processing is separated from file io it lets you write unit tests, or even just run the processing code on example text and manually review the output without fiddling around with files.

这里有 3 个问题:读取 UTF-8 文件、处理行和写入 UTF-8 文件。假设您的处理是基于行的,这在 Python 中非常有效,因为打开和迭代文件的行是语言内置的。除了更清晰之外,这也更有效,因为它允许您处理不适合内存的大文件。最后,它为您提供了一种测试代码的好方法——因为处理与文件 io 分开,它允许您编写单元测试,甚至只是在示例文本上运行处理代码并手动查看输出而无需摆弄文件。

I'm converting the lines to upper case for the purposes of example - presumably your processing will be more interesting. I like using yield here - it makes it easy for the processing to remove or insert extra lines although that's not being used in my trivial example.

出于示例的目的,我将这些行转换为大写 - 大概您的处理会更有趣。我喜欢在这里使用 yield - 尽管在我的琐碎示例中没有使用它,但它可以轻松地处理删除或插入额外的行。

def process(lines):
    for line in lines:
        yield line.upper()

with codecs.open(file1, 'r', 'utf-8') as infile:
    with codecs.open(file2, 'w', 'utf-8') as outfile:
        for line in process(infile):
            outfile.write(line)