Python 删除文件中的最后一个字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18857352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove very last character in file
提问by user2681562
After looking all over the Internet, I've come to this.
在浏览了整个互联网之后,我来到了这个。
Let's say I have already made a text file that reads:
Hello World
假设我已经制作了一个文本文件,内容如下:
Hello World
Well, I want to remove the very last character (in this case d
) from this text file.
好吧,我想d
从这个文本文件中删除最后一个字符(在这种情况下)。
So now the text file should look like this: Hello Worl
所以现在文本文件应该是这样的: Hello Worl
But I have no idea how to do this.
但我不知道该怎么做。
All I want, more or less, is a single backspace function for text files on my HDD.
我想要的,或多或少,就是我的 HDD 上文本文件的单个退格功能。
This needs to work on Linux as that's what I'm using.
这需要在 Linux 上运行,因为这就是我正在使用的。
采纳答案by Martijn Pieters
Use fileobject.seek()
to seek 1 position from the end, then use file.truncate()
to remove the remainder of the file:
用于fileobject.seek()
从末尾寻找 1 个位置,然后用于file.truncate()
删除文件的其余部分:
import os
with open(filename, 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
This works fine for single-byte encodings. If you have a multi-byte encoding (such as UTF-16 or UTF-32) you need to seek back enough bytes from the end to account for a single codepoint.
这适用于单字节编码。如果您有一个多字节编码(例如 UTF-16 或 UTF-32),您需要从末尾寻找足够的字节来解释单个代码点。
For variable-byte encodings, it depends on the codec if you can use this technique at all. For UTF-8, you need to find the first byte (from the end) where bytevalue & 0xC0 != 0x80
is true, and truncate from that point on. That ensures you don't truncate in the middle of a multi-byte UTF-8 codepoint:
对于可变字节编码,是否可以使用此技术取决于编解码器。对于 UTF-8,您需要找到第一个字节(从末尾开始)bytevalue & 0xC0 != 0x80
为真,并从该点开始截断。这确保您不会在多字节 UTF-8 代码点中间截断:
with open(filename, 'rb+') as filehandle:
# move to end, then scan forward until a non-continuation byte is found
filehandle.seek(-1, os.SEEK_END)
while filehandle.read(1) & 0xC0 == 0x80:
# we just read 1 byte, which moved the file position forward,
# skip back 2 bytes to move to the byte before the current.
filehandle.seek(-2, os.SEEK_CUR)
# last read byte is our truncation point, move back to it.
filehandle.seek(-1, os.SEEK_CUR)
filehandle.truncate()
Note that UTF-8 is a superset of ASCII, so the above works for ASCII-encoded files too.
请注意,UTF-8 是 ASCII 的超集,因此上述内容也适用于 ASCII 编码的文件。
回答by dawg
with open(urfile, 'rb+') as f:
f.seek(0,2) # end of file
size=f.tell() # the size...
f.truncate(size-1) # truncate at that size - how ever many characters
Be sure to use binary mode on windows since Unix file line ending many return an illegal or incorrectcharacter count.
一定要在 Windows 上使用二进制模式,因为 Unix 文件行结尾 many 返回非法或不正确的字符数。
回答by quasoft
Accepted answer of Martijn is simple and kind of works, but does not account for text files with:
Martijn 的公认答案很简单,也很有效,但不考虑具有以下内容的文本文件:
- UTF-8 encodingcontaining non-English characters (which is the default encoding for text files in Python 3)
- one newline character at the end of the file(which is the default in Linux editors like
vim
orgedit
)
- 包含非英文字符的UTF-8 编码(这是 Python 3 中文本文件的默认编码)
- 文件末尾的一个换行符(这是 Linux 编辑器中的默认值,如
vim
或gedit
)
If the text file contains non-English characters, neither of the answers provided so far would work.
如果文本文件包含非英文字符,则目前提供的任何答案都不起作用。
What follows is an example, that solves both problems, which also allows removing more than one character from the end of the file:
下面是一个示例,它解决了这两个问题,它还允许从文件末尾删除多个字符:
import os
def truncate_utf8_chars(filename, count, ignore_newlines=True):
"""
Truncates last `count` characters of a text file encoded in UTF-8.
:param filename: The path to the text file to read
:param count: Number of UTF-8 characters to remove from the end of the file
:param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
"""
with open(filename, 'rb+') as f:
last_char = None
size = os.fstat(f.fileno()).st_size
offset = 1
chars = 0
while offset <= size:
f.seek(-offset, os.SEEK_END)
b = ord(f.read(1))
if ignore_newlines:
if b == 0x0D or b == 0x0A:
offset += 1
continue
if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
# This is the first byte of a UTF8 character
chars += 1
if chars == count:
# When `count` number of characters have been found, move current position back
# with one byte (to include the byte just checked) and truncate the file
f.seek(-1, os.SEEK_CUR)
f.truncate()
return
offset += 1
How it works:
这个怎么运作:
- Reads only the last few bytes of a UTF-8 encoded text file in binary mode
- Iterates the bytes backwards, looking for the start of a UTF-8 character
- Once a character (different from a newline) is found, return that as the last character in the text file
- 以二进制模式仅读取 UTF-8 编码文本文件的最后几个字节
- 向后迭代字节,查找 UTF-8 字符的开头
- 一旦找到一个字符(不同于换行符),将其作为文本文件中的最后一个字符返回
Sample text file - bg.txt
:
示例文本文件 - bg.txt
:
Здравей свят
How to use:
如何使用:
filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())
Outputs:
输出:
Before truncate: Здравей свят
After truncate: Здравей свя
This works with both UTF-8 and ASCII encoded files.
这适用于 UTF-8 和 ASCII 编码的文件。
回答by vins mv
here is a dirty way (erase & recreate)... i don't advice to use this, but, it's possible to do like this ..
这是一种肮脏的方式(擦除和重新创建)...我不建议使用它,但是,可以这样做..
x = open("file").read()
os.remove("file")
open("file").write(x[:-1])
回答by metinsenturk
In case you are not reading the file in binary mode, where you have only 'w' permissions, I can suggest the following.
如果您不是以二进制模式读取文件,而您只有“w”权限,我可以建议以下内容。
f.seek(f.tell() - 1, os.SEEK_SET)
f.write('')
In this code above, f.seek()
will only accept f.tell()
b/c you do not have 'b' access. then you can set the cursor to the starting of the last element. Then you can delete the last element by an empty string.
在上面的这段代码中,f.seek()
只接受f.tell()
b/c 你没有“b”访问权限。然后您可以将光标设置到最后一个元素的开头。然后您可以通过空字符串删除最后一个元素。
回答by Coddy
with open('file.txt', 'w') as f:
f.seek(0, 2) # seek to end of file; f.seek(0, os.SEEK_END) is legal
f.seek(f.tell() - 2, 0) # seek to the second last char of file; f.seek(f.tell()-2, os.SEEK_SET) is legal
f.truncate()
subject to what last character of the file is, could be newline (\n) or anything else.
取决于文件的最后一个字符是什么,可以是换行符 (\n) 或其他任何东西。