Python - 如何打开文件并以字节为单位指定偏移量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3299213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:19:41  来源:igfitidea点击:

Python - How can I open a file and specify the offset in bytes?

pythonfile-iobyteoffset

提问by dave

I'm writing a program that will parse an Apache log file periodically to log it's visitors, bandwidth usage, etc..

我正在编写一个程序,该程序将定期解析 Apache 日志文件以记录访问者、带宽使用情况等。

The problem is, I don't want to open the log and parse data I've already parsed. For example:

问题是,我不想打开日志并解析我已经解析过的数据。例如:

line1
line2
line3

If I parse that file, I'll save all the lines then save that offset. That way, when I parse it again, I get:

如果我解析该文件,我将保存所有行,然后保存该偏移量。这样,当我再次解析它时,我得到:

line1
line2
line3 - The log will open from this point
line4
line5

Second time round, I'll get line4 and line5. Hopefully this makes sense...

第二轮,我会得到第 4 行和第 5 行。希望这是有道理的......

What I need to know is, how do I accomplish this? Python has the seek() function to specify the offset... So do I just get the filesize of the log (in bytes) after parsing it then use that as the offset (in seek()) the second time I log it?

我需要知道的是,我如何实现这一点?Python 有 seek() 函数来指定偏移量......那么我是否在解析日志后只获取日志的文件大小(以字节为单位),然后将其用作我第二次记录时的偏移量(在 seek() 中)?

I can't seem to think of a way to code this >.<

我似乎想不出一种方法来对此进行编码 >.<

采纳答案by luc

You can manage the position in the file thanks to the seekand tellmethods of the fileclass see https://docs.python.org/2/tutorial/inputoutput.html

由于类的seektell方法,您可以管理文件中的位置,file请参阅 https://docs.python.org/2/tutorial/inputoutput.html

The tellmethod will tell you where to seek next time you open

tell方法会告诉您下次打开时要查找的位置

回答by Vinko Vrsalovic

If your logfiles fit easily in memory(this is, you have a reasonable rotation policy) you can easily do something like:

如果您的日志文件很容易放入内存中也就是说,您有一个合理的轮换策略),您可以轻松地执行以下操作:

log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)

If you cannot do this, you can use something like (see accepted answer's use of seek and tell, in case you need to do it with them) Get last n lines of a file with Python, similar to tail

如果你不能这样做,你可以使用类似的东西(请参阅已接受的答案对搜索和告诉的使用,以防万一你需要用它们来做)用 Python 获取文件的最后 n 行,类似于 tail

回答by Guillaume Lebourgeois

If you're parsing your log line per line, you could juste save line number from the last parsing. You would juste have then to start read it from the good line the next time.

如果您要每行解析日志行,则可以从上次解析中保存行号。下次你就必须从好的行开始阅读它。

Seeking is more usefull when you have to be in a very specific place in the file.

当您必须在文件中非常特定的位置时,寻找更有用。

回答by systempuntoout

Easy but not recommended :):

简单但不推荐:):

last_line_processed = get_last_line_processed()    
with open('file.log') as log
    for record_number, record in enumerate(log):
        if record_number >= last_line_processed:
            parse_log(record)

回答by Wayne Werner

log = open('myfile.log')
pos = open('pos.dat','w')
print log.readline()
pos.write(str(f.tell())
log.close()
pos.close()

log = open('myfile.log')
pos = open('pos.dat')
log.seek(int(pos.readline()))
print log.readline()

Of course you shouldn't use it like that - you should wrap the operations up in functions like save_position(myfile)and load_position(myfile), but the functionality is all there.

当然,您不应该那样使用它 - 您应该将操作包装在save_position(myfile)load_position(myfile)之类的函数中,但功能就在那里。

回答by user106514

Note that you can seek() in python from the end of the file:

请注意,您可以从文件末尾在 python 中查找():

f.seek(-3, os.SEEK_END)

puts the read position 3 lines from the EOF.

将读取位置放在距离 EOF 3 行的位置。

However, why not use diff, either from the shell or with difflib?

但是,为什么不从 shell 或difflib使用 diff呢?

回答by Tony Veijalainen

Here is code proving using the length sugestion of yours and the tell methond:

这是使用你的长度和告诉方法证明的代码:

beginning="""line1
line2
line3"""

end="""- The log will open from this point
line4
line5"""

openfile= open('log.txt','w')
openfile.write(beginning)
endstarts=openfile.tell()
openfile.close()

open('log.txt','a').write(end)
print open('log.txt').read()

print("\nAgain:")
end2 = open('log.txt','r')
end2.seek(len(beginning))

print end2.read()  ## wrong by two too little because of magic newlines in Windows
end2.seek(endstarts)

print "\nOk in Windows also"
print end2.read()
end2.close()

回答by Peter Lundberg

Here is an efficient and safe snippet to do that saving the offset read in a parallell file. Basically logtail in python.

这是一个高效且安全的代码段,可以将读取的偏移量保存在并行文件中。基本上是python中的logtail。

with open(filename) as log_fd:
    offset_filename = os.path.join(OFFSET_ROOT_DIR,filename)
    if not os.path.exists(offset_filename):
        os.makedirs(os.path.dirname(offset_filename))
        with open(offset_filename, 'w') as offset_fd:
            offset_fd.write(str(0))
    with open(offset_filename, 'r+') as offset_fd:
        log_fd.seek(int(offset_fd.readline()) or 0)
        new_logrows_handler(log_fd.readlines())
        offset_fd.seek(0)
        offset_fd.write(str(log_fd.tell()))