windows 如何安全地写入文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1812115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 13:30:16  来源:igfitidea点击:

How to safely write to a file?

pythonwindowsfile

提问by Rickard Lindberg

Imagine you have a library for working with some sort of XML file or configuration file. The library reads the whole file into memory and provides methods for editing the content. When you are done manipulating the content you can call a writeto save the content back to file. The question is how to do this in a safe way.

假设您有一个用于处理某种 XML 文件或配置文件的库。该库将整个文件读入内存并提供编辑内容的方法。完成对内容的操作后,您可以调用 awrite将内容保存回文件。问题是如何以安全的方式做到这一点。

Overwriting the existing file (starting to write to the original file) is obviously not safe. If the writemethod fails before it is done you end up with a half written file and you have lost data.

覆盖现有文件(开始写入原始文件)显然是不安全的。如果该write方法在完成之前失败,您最终会写入一半的文件并且您丢失了数据。

A better option would be to write to a temporaryfile somewhere, and when the writemethod has finished, you copythe temporary file to the original file.

更好的选择是在某处写入临时文件,当该write方法完成后,您临时文件复制到原始文件中。

Now, if the copy somehow fails, you still have correctly saved data in the temporary file. And if the copy succeeds, you can remove the temporary file.

现在,如果复制以某种方式失败,您仍然可以在临时文件中正确保存数据。如果复制成功,您可以删除临时文件。

On POSIX systems I guess you can use the renamesystem call which is an atomic operation. But how would you do this best on a Windows system? In particular, how do you handle this best using Python?

在 POSIX 系统上,我猜您可以使用rename系统调用,这是一个原子操作。但是,您将如何在 Windows 系统上做到最好?特别是,您如何使用Python最好地处理这个问题?

Also, is there another scheme for safely writing to files?

另外,是否有另一种安全写入文件的方案?

采纳答案by Shailesh Kumar

If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.

如果你看到 Python 的文档,它清楚地提到 os.rename() 是一个原子操作。因此,在您的情况下,将数据写入临时文件然后将其重命名为原始文件将是非常安全的。

Another way could work like this:

另一种方法可以这样工作:

  • let original file be abc.xml
  • create abc.xml.tmp and write new data to it
  • rename abc.xml to abc.xml.bak
  • rename abc.xml.tmp to abc.xml
  • after new abc.xml is properly put in place, remove abc.xml.bak
  • 让原始文件为 abc.xml
  • 创建 abc.xml.tmp 并将新数据写入其中
  • 将 abc.xml 重命名为 abc.xml.bak
  • 将 abc.xml.tmp 重命名为 abc.xml
  • 在新的 abc.xml 正确放置后,删除 abc.xml.bak

As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.

正如您所看到的,您有 abc.xml.bak,如果存在与 tmp 文件相关的任何问题并将其复制回来,您可以使用它来恢复。

回答by u0b34a0f6ae

If you want to be POSIXly correct and save you have to:

如果您想保持 POSIXly 正确并保存,您必须:

  1. Write to temporary file
  2. Flush and fsyncthe file (or fdatasync)
  3. Rename over the original file
  1. 写入临时文件
  2. 刷新和fsync文件(或fdatasync
  3. 重命名原始文件

Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.

请注意,调用 fsync 对性能有不可预测的影响——ext3 上的 Linux 可能会因此导致磁盘 I/O 停滞整整几秒,具体取决于其他未完成的 I/O。

Notice that renameis notan atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.

请注意,rename不是在POSIX一个原子操作-如你期望至少不会在有关文件中的数据。但是,大多数操作系统和文件系统都会以这种方式工作。但是您似乎错过了关于 Ext4 和文件系统保证原子性的非常大的 linux 讨论。我不知道确切的链接位置,但这是一个开始:ext4 和数据丢失

Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!

但是请注意,在许多系统上,重命名在实践中与您期望的一样安全。然而,在所有可能的 linux 配置中同时获得性能和可靠性是不可能的!

With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.

通过写入临时文件,然后重命名临时文件,人们会期望操作是相关的并且会按顺序执行。

The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.

然而,问题是大多数(如果不是所有)文件系统都将元数据和数据分开。重命名只是元数据。这对您来说可能听起来很可怕,但是文件系统重视元数据而不是数据(例如 HFS+ 或 Ext3,4 中的日志记录)!原因是元数据更轻,如果元数据损坏,整个文件系统就会损坏——文件系统当然必须自行保存,然后按此顺序保存用户数据。

Ext4 did break the renameexpectation when it first came out, however heuristics were added to resolve it. The issue is nota failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.

Ext4rename刚出现时确实打破了预期,但是添加了启发式方法来解决它。问题不是重命名失败,而是重命名成功。Ext4 可能会成功注册重命名,但如果此后不久发生崩溃,则无法写出文件数据。结果是一个长度为 0 的文件,既不是原始数据也不是新数据。

So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!

简而言之,POSIX 不做这样的保证。阅读链接的 Ext4 文章了解更多信息!

回答by Michal Sznajder

In Win API I found quite nice function ReplaceFilethat does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFilecombo.

在 Win API 中,我发现了相当不错的函数ReplaceFile,即使有可选的备份,它也能执行名称所暗示的操作。DeleteFileMoveFile组合总是有办法的。

In general what you want to do is really good. And I cannot think of any better write scheme.

总的来说,你想做的事情真的很好。我想不出任何更好的写作方案。

回答by S.Lott

The standard solution is this.

标准的解决方案是这样的。

  1. Write a new file with a similar name. X.ext# for example.

  2. When that file has been closed (and perhaps even read and checksummed), then you two two renames.

    • X.ext (the original) to X.ext~

    • X.ext# (the new one) to X.ext

  3. (Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.

  1. 编写一个具有相似名称的新文件。例如 X.ext#。

  2. 当该文件已关闭(甚至可能已读取和校验和)后,您将两个重命名。

    • X.ext(原版)到X.ext~

    • X.ext#(新的)到 X.ext

  3. (仅适用于疯狂的偏执狂)调用操作系统同步函数来强制脏缓冲区写入。

At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.

任何时候都不会丢失或损坏任何东西。唯一的故障可能发生在重命名期间。但是您没有丢失任何东西或损坏任何东西。在最终重命名之前,原始文件是可以恢复的。

回答by miku

A simplistic solution. Use tempfileto create a temporary file and if writing succeeds the just rename the file to your original configuration file.

一个简单的解决方案。使用tempfile创建一个临时文件,如果写入成功的只是文件重命名为原来的配置文件。

For locking a file, see portalocker.

要锁定文件,请参阅portallocker

回答by Mahmoud Hashemi

There's now a codified, pure-Python, and I dare say Pythonic solution to this in the boltons utility library: boltons.fileutils.atomic_save.

现在有一个编码的纯 Python,我敢说在boltons 实用程序库中有 Pythonic 解决方案:boltons.fileutils.atomic_save

Just pip install boltons, then:

只是pip install boltons,那么:

from boltons.fileutils import atomic_save

with atomic_save('/path/to/file.txt') as f:
    f.write('this will only overwrite if it succeeds!\n')

There are a lot of practical options, all well-documented. Full disclosure, I am the author of boltons, but this particular part was built with a lot of community help. Don't hesitate to drop a noteif something is unclear!

有很多实用的选项,都有详细记录。完全公开,我是boltons的作者,但是这个特殊的部分是在很多社区帮助下构建的。如果有不清楚的地方,请不要犹豫,留言!

回答by Jason R. Coombs

Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.

根据 RedGlyph 的建议,我添加了一个 ReplaceFile 的实现,它使用 ctypes 来访问 Windows API。我首先将它添加到 jaraco.windows.api.filesystem。

ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
    LPWSTR,
    LPWSTR,
    LPWSTR,
    DWORD,
    LPVOID,
    LPVOID,
    ]

REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4

I then tested the behavior using this script.

然后我使用这个脚本测试了行为。

from jaraco.windows.api.filesystem import ReplaceFile
import os

open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')

While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docsfor details.

虽然这仅适用于 Windows,但它似乎具有其他替换例程所缺乏的许多不错的功能。有关详细信息,请参阅API 文档

回答by unutbu

You could use the fileinput module to handle the backing-up and in-place writing for you:

您可以使用 fileinput 模块为您处理备份和就地写入:

import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
    # inplace=True causes the original file to be moved to a backup
    # standard output is redirected to the original file.
    # backup='.bak' specifies the extension for the backup file.

    # manipulate line
    newline=process(line)
    print(newline)

If you need to read in the entire contents before you can write the newline's, then you can do that first, then print entire new contents with

如果您需要先阅读整个内容,然后才能编写换行符,那么您可以先执行此操作,然后使用

newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
    print(newcontents)
    break

If the script ends abruptly, you will still have the backup.

如果脚本突然结束,您仍然会有备份。