Python 从文件末尾寻找抛出不受支持的异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21533391/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:05:52  来源:igfitidea点击:

Seeking from end of file throwing unsupported exception

pythonpython-3.xfile

提问by seriousgeek

I have this code snippet and I'm trying to seek backwards from the end of file using python:

我有这个代码片段,我正在尝试使用 python 从文件末尾向后寻找:

f=open('D:\SGStat.txt','a');
    f.seek(0,2)
    f.seek(-3,2)

This throws the following exception while running:

这会在运行时引发以下异常:

f.seek(-3,2)
io.UnsupportedOperation: can't do nonzero end-relative seeks

Am i missing something here?

我在这里错过了什么吗?

采纳答案by jonrsharpe

From the documentationfor Python 3.2 and up:

来自Python 3.2 及更高版本的文档

In text files (those opened without a bin the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).

在文本文件中(那些b在模式字符串中没有打开的文件),只允许相对于文件开头的搜索(例外是搜索到以 结尾的文件seek(0, 2))。

Therefore, you can change your program to read:

因此,您可以将程序更改为:

f = open('D:\SGStat.txt', 'ab')
f.seek(0, 2)
f.seek(-3, 2)

However, you should be aware that adding the bflag when you are reading or writing text can have unintended consequences (with multibyte encoding for example), and in fact changes the type of data read or written. For a more thorough discussion of the cause of the problem, and a solution that does not require adding the bflag, see another answer to this question.

但是,您应该知道,在b读取或写入文本时添加标志可能会产生意想不到的后果(例如,使用多字节编码),并且实际上会更改读取或写入的数据类型。有关问题原因的更彻底讨论,以及不需要添加b标志的解决方案,请参阅此问题的另一个答案

回答by Vikas Thada

In order to use seek from current position and end you have to open the text file in binary mode. See this example where I have created a file "nums.txt" and have put "ABCDEFGHIJKLMNOPQRSTUVWXYZ" in the file. I read the letters of the string "PYTHON" from the file and display the same. See the code I've run in python 3.6 windows in anaconda 4.2

为了从当前位置和结束使用搜索,您必须以二进制模式打开文本文件。请参阅此示例,其中我创建了一个文件“nums.txt”并将“ABCDEFGHIJKLMNOPQRSTUVWXYZ”放入文件中。我从文件中读取字符串“PYTHON”的字母并显示相同的内容。查看我在 anaconda 4.2 中的 python 3.6 windows 中运行的代码

    >>> file=open('nums.txt','rb')
    >>> file.seek(15,0)
    15
    >>> file.read(1).decode('utf-8')
    'P'
    >>> file.seek(8,1)
    24
    >>> file.read(1).decode('utf-8')
    'Y'
    >>> file.seek(-7,2)
    19
    >>> file.read(1).decode('utf-8')
    'T'
    >>> file.seek(7,0)
    7
    >>> file.read(1).decode('utf-8')
    'H'
    >>> file.seek(6,1)
    14
    >>> file.read(1).decode('utf-8')
    'O'
    >>> file.seek(-2,1)
    13
    >>> file.read(1).decode('utf-8')
    'N'

回答by Eric Lindsey

The existing answers doanswer the question, but provide no solution.

现有的答案确实回答了这个问题,但没有提供解决方案。

From readthedocs:

阅读文档

If the file is opened in text mode (without b), only offsets returned by tell()are legal. Use of other offsets causes undefined behavior.

如果文件以文本模式(没有b)打开,则只有 返回的偏移量tell()是合法的。使用其他偏移量会导致未定义的行为。

This is supported by the documentation, which says that:

这是由文档支持,它说:

In text files (those opened without a bin the mode string), only seeks relative to the beginning of the file [os.SEEK_SET]are allowed...

在文本文件中(那些b在模式字符串中没有打开的文件),只允许相对于文件[ os.SEEK_SET]的开头进行查找...

This means if you have this code from old Python:

这意味着如果你有来自旧 Python 的代码:

f.seek(-1, 1)   # seek -1 from current position

it would look like this in Python 3:

在 Python 3 中看起来像这样:

f.seek(f.tell() - 1, os.SEEK_SET)   # os.SEEK_SET == 0

Solution

解决方案

将这些信息放在一起,我们可以实现 OP 的目标:

f.seek(0, os.SEEK_END)              # seek to end of file; f.seek(0, 2) is legal
f.seek(f.tell() - 3, os.SEEK_SET)   # go backwards 3 bytes

回答by Philip Couling

Eric Lindsey's answerdoes not work because UTF-8 files can have more than one byte per character. Worse, for those of us who speak English as a first language and work with English only files, it might work just long enoughto get out into production code and really break things.

Eric Lindsey 的回答不起作用,因为 UTF-8 文件每个字符可以有一个以上的字节。更糟糕的是,对于我们这些以英语为第一语言并只处理英语文件的人来说,它可能只需要足够长的时间就可以进入生产代码并真正破坏事情。



The following answer is based on undefined behavior

以下答案基于未定义的行为

... but it does work for now for UTF-8 in Python 3.7.

...但它现在确实适用于 Python 3.7 中的 UTF-8。

To seek backwards through a file in text mode, you can do so as long as you correctly handle the UnicodeDecodeErrorcaused by seeking to a byte which is not the start of a UTF-8 Character. Since we are seeking backwards we can simply seek back an extra byte until we find the start of the character.

要在文本模式下向后查找文件,只要正确处理UnicodeDecodeError查找不是 UTF-8 字符开头的字节所引起的问题,就可以这样做。由于我们正在向后寻找,我们可以简单地向后寻找一个额外的字节,直到找到字符的开头。

The result of f.tell()is still the byte position in the file for UTF-8 files, at-least for now. So an f.seek()to an invalid offset will raise a UnicodeDecodeError when you subsequently f.read()and this can be corrected by f.seek()again to a different offset. At least this works for now.

f.tell()对于 UTF-8 文件,结果仍然是文件中的字节位置,至少目前是这样。因此,f.seek()当您随后使用无效偏移量时,会引发 UnicodeDecodeErrorf.read()错误,这可以通过f.seek()再次纠正为不同的偏移量。至少目前这有效。

Eg, seeking to the beginning of a line (just after the \n):

例如,寻找到一行的开头(就在 之后\n):

pos = f.tell() - 1
if pos < 0:
    pos = 0
f.seek(pos, os.SEEK_SET)
while pos > 0:
    try:
        character = f.read(1)
        if character == '\n':
            break
    except UnicodeDecodeError:
        pass
    pos -= 1
    f.seek(pos, os.SEEK_SET)