Python 从文件末尾寻找抛出不受支持的异常
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21533391/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Seeking from end of file throwing unsupported exception
提问by seriousgeek
I have this code snippet and I'm trying to seek backwards from the end of file using python:
我有这个代码片段,我正在尝试使用 python 从文件末尾向后寻找:
f=open('D:\SGStat.txt','a');
    f.seek(0,2)
    f.seek(-3,2)
This throws the following exception while running:
这会在运行时引发以下异常:
f.seek(-3,2)
io.UnsupportedOperation: can't do nonzero end-relative seeks
Am i missing something here?
我在这里错过了什么吗?
采纳答案by jonrsharpe
From the documentationfor Python 3.2 and up:
来自Python 3.2 及更高版本的文档:
In text files (those opened without a
bin the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end withseek(0, 2)).
在文本文件中(那些
b在模式字符串中没有打开的文件),只允许相对于文件开头的搜索(例外是搜索到以 结尾的文件seek(0, 2))。
Therefore, you can change your program to read:
因此,您可以将程序更改为:
f = open('D:\SGStat.txt', 'ab')
f.seek(0, 2)
f.seek(-3, 2)
However, you should be aware that adding the bflag when you are reading or writing text can have unintended consequences (with multibyte encoding for example), and in fact changes the type of data read or written. For a more thorough discussion of the cause of the problem, and a solution that does not require adding the bflag, see another answer to this question.
但是,您应该知道,在b读取或写入文本时添加标志可能会产生意想不到的后果(例如,使用多字节编码),并且实际上会更改读取或写入的数据类型。有关问题原因的更彻底讨论,以及不需要添加b标志的解决方案,请参阅此问题的另一个答案。
回答by Vikas Thada
In order to use seek from current position and end you have to open the text file in binary mode. See this example where I have created a file "nums.txt" and have put "ABCDEFGHIJKLMNOPQRSTUVWXYZ" in the file. I read the letters of the string "PYTHON" from the file and display the same. See the code I've run in python 3.6 windows in anaconda 4.2
为了从当前位置和结束使用搜索,您必须以二进制模式打开文本文件。请参阅此示例,其中我创建了一个文件“nums.txt”并将“ABCDEFGHIJKLMNOPQRSTUVWXYZ”放入文件中。我从文件中读取字符串“PYTHON”的字母并显示相同的内容。查看我在 anaconda 4.2 中的 python 3.6 windows 中运行的代码
    >>> file=open('nums.txt','rb')
    >>> file.seek(15,0)
    15
    >>> file.read(1).decode('utf-8')
    'P'
    >>> file.seek(8,1)
    24
    >>> file.read(1).decode('utf-8')
    'Y'
    >>> file.seek(-7,2)
    19
    >>> file.read(1).decode('utf-8')
    'T'
    >>> file.seek(7,0)
    7
    >>> file.read(1).decode('utf-8')
    'H'
    >>> file.seek(6,1)
    14
    >>> file.read(1).decode('utf-8')
    'O'
    >>> file.seek(-2,1)
    13
    >>> file.read(1).decode('utf-8')
    'N'
回答by Eric Lindsey
The existing answers doanswer the question, but provide no solution.
现有的答案确实回答了这个问题,但没有提供解决方案。
From readthedocs:
从阅读文档:
If the file is opened in text mode (without
b), only offsets returned bytell()are legal. Use of other offsets causes undefined behavior.
如果文件以文本模式(没有
b)打开,则只有 返回的偏移量tell()是合法的。使用其他偏移量会导致未定义的行为。
This is supported by the documentation, which says that:
In text files (those opened without a
bin the mode string), only seeks relative to the beginning of the file [os.SEEK_SET]are allowed...
在文本文件中(那些
b在模式字符串中没有打开的文件),只允许相对于文件[os.SEEK_SET]的开头进行查找...
This means if you have this code from old Python:
这意味着如果你有来自旧 Python 的代码:
f.seek(-1, 1)   # seek -1 from current position
it would look like this in Python 3:
在 Python 3 中看起来像这样:
f.seek(f.tell() - 1, os.SEEK_SET)   # os.SEEK_SET == 0
Solution
解决方案
将这些信息放在一起,我们可以实现 OP 的目标:f.seek(0, os.SEEK_END)              # seek to end of file; f.seek(0, 2) is legal
f.seek(f.tell() - 3, os.SEEK_SET)   # go backwards 3 bytes
回答by Philip Couling
Eric Lindsey's answerdoes not work because UTF-8 files can have more than one byte per character. Worse, for those of us who speak English as a first language and work with English only files, it might work just long enoughto get out into production code and really break things.
Eric Lindsey 的回答不起作用,因为 UTF-8 文件每个字符可以有一个以上的字节。更糟糕的是,对于我们这些以英语为第一语言并只处理英语文件的人来说,它可能只需要足够长的时间就可以进入生产代码并真正破坏事情。
The following answer is based on undefined behavior
以下答案基于未定义的行为
... but it does work for now for UTF-8 in Python 3.7.
...但它现在确实适用于 Python 3.7 中的 UTF-8。
To seek backwards through a file in text mode, you can do so as long as you correctly handle the UnicodeDecodeErrorcaused by seeking to a byte which is not the start of a UTF-8 Character.  Since we are seeking backwards we can simply seek back an extra byte until we find the start of the character.
要在文本模式下向后查找文件,只要正确处理UnicodeDecodeError查找不是 UTF-8 字符开头的字节所引起的问题,就可以这样做。由于我们正在向后寻找,我们可以简单地向后寻找一个额外的字节,直到找到字符的开头。
The result of f.tell()is still the byte position in the file for UTF-8 files, at-least for now. So an f.seek()to an invalid offset will raise a UnicodeDecodeError when you subsequently f.read()and this can be corrected by f.seek()again to a different offset. At least this works for now.
f.tell()对于 UTF-8 文件,结果仍然是文件中的字节位置,至少目前是这样。因此,f.seek()当您随后使用无效偏移量时,会引发 UnicodeDecodeErrorf.read()错误,这可以通过f.seek()再次纠正为不同的偏移量。至少目前这有效。
Eg, seeking to the beginning of a line (just after the \n):
例如,寻找到一行的开头(就在 之后\n):
pos = f.tell() - 1
if pos < 0:
    pos = 0
f.seek(pos, os.SEEK_SET)
while pos > 0:
    try:
        character = f.read(1)
        if character == '\n':
            break
    except UnicodeDecodeError:
        pass
    pos -= 1
    f.seek(pos, os.SEEK_SET)

