写入 .txt 文件 (UTF-8)，python

Question

提问by Gusto

I want to save the output (contents) to a file (saving it in UTF-8). The file shouldn't be overwritten, it should be saved as a new file - e.g. file2.txtSo, I fists open a file.txt, encode it in UTF-8, do some stuff and then wanna save it to file2.txtin UTF-8. How do I do this?

我想将输出 ( contents)保存到文件中（以 UTF-8 格式保存）。该文件不应该被覆盖，它应该被保存为一个新文件 - 例如file2.txt，我拳头打开一个file.txt，用 UTF-8 编码，做一些事情，然后想把它保存到file2.txtUTF-8。我该怎么做呢？

import codecs
def openfile(filename):
    with codecs.open(filename, encoding="UTF-8") as F:
        contents = F.read()
        ...

Answer 1

采纳答案by adamk

The short way:

简短的方法：

file('file2.txt','w').write( file('file.txt').read().encode('utf-8') )

The long way:

路漫漫其修远兮：

data = file('file.txt').read()
... process data ...
data = data.encode('utf-8')
file('file2.txt','w').write( data )

And using 'codecs' explicitly:

并明确使用“编解码器”：

codecs.getwriter('utf-8')(file('/tmp/bla3','w')).write(data)

Answer 2

回答by Ignacio Vazquez-Abrams

Open a second file. Use contextlib.nested()if need be. Use shutil.copyfileobj()to copy the contents.

打开第二个文件。使用contextlib.nested()，如果需要的话。使用shutil.copyfileobj()复制的内容。

Answer 3

回答by Ignacio Vazquez-Abrams

I like to separate concerns in situations like this - I think it really makes the code cleaner, easier to maintain, and can be more efficient.

我喜欢在这种情况下分离关注点 - 我认为这确实使代码更清晰，更易于维护，并且可以更高效。

Here you've 3 concerns: reading a UTF-8 file, processing the lines, and writing a UTF-8 file. Assuming your processing is line-based, this works perfectly in Python, since opening and iterating over lines of a file is built in to the language. As well as being clearer, this is more efficient too since it allows you process huge files that don't fit into memory. Finally, it gives you a great way to test your code - because processing is separated from file io it lets you write unit tests, or even just run the processing code on example text and manually review the output without fiddling around with files.

这里有 3 个问题：读取 UTF-8 文件、处理行和写入 UTF-8 文件。假设您的处理是基于行的，这在 Python 中非常有效，因为打开和迭代文件的行是语言内置的。除了更清晰之外，这也更有效，因为它允许您处理不适合内存的大文件。最后，它为您提供了一种测试代码的好方法——因为处理与文件 io 分开，它允许您编写单元测试，甚至只是在示例文本上运行处理代码并手动查看输出而无需摆弄文件。

I'm converting the lines to upper case for the purposes of example - presumably your processing will be more interesting. I like using yield here - it makes it easy for the processing to remove or insert extra lines although that's not being used in my trivial example.

出于示例的目的，我将这些行转换为大写 - 大概您的处理会更有趣。我喜欢在这里使用 yield - 尽管在我的琐碎示例中没有使用它，但它可以轻松地处理删除或插入额外的行。

def process(lines):
    for line in lines:
        yield line.upper()

with codecs.open(file1, 'r', 'utf-8') as infile:
    with codecs.open(file2, 'w', 'utf-8') as outfile:
        for line in process(infile):
            outfile.write(line)

写入 .txt 文件 (UTF-8)，python

提问by Gusto

采纳答案by adamk

回答by Ignacio Vazquez-Abrams

回答by Ignacio Vazquez-Abrams

相关推荐

最近更新

标签

写入 .txt 文件 (UTF-8)，python

提问by Gusto

采纳答案by adamk

回答by Ignacio Vazquez-Abrams

回答by Ignacio Vazquez-Abrams

相关推荐

从python生成电影而不将单个帧保存到文件

Python 无法在 Mac OS X 上安装 matplotlib

如何在 Python 中继承和扩展列表对象？

python中的点积

相关推荐

最近更新

标签