Python - 如何打开文件并以字节为单位指定偏移量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3299213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - How can I open a file and specify the offset in bytes?
提问by dave
I'm writing a program that will parse an Apache log file periodically to log it's visitors, bandwidth usage, etc..
我正在编写一个程序,该程序将定期解析 Apache 日志文件以记录访问者、带宽使用情况等。
The problem is, I don't want to open the log and parse data I've already parsed. For example:
问题是,我不想打开日志并解析我已经解析过的数据。例如:
line1
line2
line3
If I parse that file, I'll save all the lines then save that offset. That way, when I parse it again, I get:
如果我解析该文件,我将保存所有行,然后保存该偏移量。这样,当我再次解析它时,我得到:
line1
line2
line3 - The log will open from this point
line4
line5
Second time round, I'll get line4 and line5. Hopefully this makes sense...
第二轮,我会得到第 4 行和第 5 行。希望这是有道理的......
What I need to know is, how do I accomplish this? Python has the seek() function to specify the offset... So do I just get the filesize of the log (in bytes) after parsing it then use that as the offset (in seek()) the second time I log it?
我需要知道的是,我如何实现这一点?Python 有 seek() 函数来指定偏移量......那么我是否在解析日志后只获取日志的文件大小(以字节为单位),然后将其用作我第二次记录时的偏移量(在 seek() 中)?
I can't seem to think of a way to code this >.<
我似乎想不出一种方法来对此进行编码 >.<
采纳答案by luc
You can manage the position in the file thanks to the seekand tellmethods of the fileclass see
https://docs.python.org/2/tutorial/inputoutput.html
由于类的seek和tell方法,您可以管理文件中的位置,file请参阅
https://docs.python.org/2/tutorial/inputoutput.html
The tellmethod will tell you where to seek next time you open
该tell方法会告诉您下次打开时要查找的位置
回答by Vinko Vrsalovic
If your logfiles fit easily in memory(this is, you have a reasonable rotation policy) you can easily do something like:
如果您的日志文件很容易放入内存中(也就是说,您有一个合理的轮换策略),您可以轻松地执行以下操作:
log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)
If you cannot do this, you can use something like (see accepted answer's use of seek and tell, in case you need to do it with them) Get last n lines of a file with Python, similar to tail
如果你不能这样做,你可以使用类似的东西(请参阅已接受的答案对搜索和告诉的使用,以防万一你需要用它们来做)用 Python 获取文件的最后 n 行,类似于 tail
回答by Guillaume Lebourgeois
If you're parsing your log line per line, you could juste save line number from the last parsing. You would juste have then to start read it from the good line the next time.
如果您要每行解析日志行,则可以从上次解析中保存行号。下次你就必须从好的行开始阅读它。
Seeking is more usefull when you have to be in a very specific place in the file.
当您必须在文件中非常特定的位置时,寻找更有用。
回答by systempuntoout
Easy but not recommended :):
简单但不推荐:):
last_line_processed = get_last_line_processed()
with open('file.log') as log
for record_number, record in enumerate(log):
if record_number >= last_line_processed:
parse_log(record)
回答by Wayne Werner
log = open('myfile.log')
pos = open('pos.dat','w')
print log.readline()
pos.write(str(f.tell())
log.close()
pos.close()
log = open('myfile.log')
pos = open('pos.dat')
log.seek(int(pos.readline()))
print log.readline()
Of course you shouldn't use it like that - you should wrap the operations up in functions like save_position(myfile)and load_position(myfile), but the functionality is all there.
当然,您不应该那样使用它 - 您应该将操作包装在save_position(myfile)和load_position(myfile)之类的函数中,但功能就在那里。
回答by user106514
回答by Tony Veijalainen
Here is code proving using the length sugestion of yours and the tell methond:
这是使用你的长度和告诉方法证明的代码:
beginning="""line1
line2
line3"""
end="""- The log will open from this point
line4
line5"""
openfile= open('log.txt','w')
openfile.write(beginning)
endstarts=openfile.tell()
openfile.close()
open('log.txt','a').write(end)
print open('log.txt').read()
print("\nAgain:")
end2 = open('log.txt','r')
end2.seek(len(beginning))
print end2.read() ## wrong by two too little because of magic newlines in Windows
end2.seek(endstarts)
print "\nOk in Windows also"
print end2.read()
end2.close()
回答by Peter Lundberg
Here is an efficient and safe snippet to do that saving the offset read in a parallell file. Basically logtail in python.
这是一个高效且安全的代码段,可以将读取的偏移量保存在并行文件中。基本上是python中的logtail。
with open(filename) as log_fd:
offset_filename = os.path.join(OFFSET_ROOT_DIR,filename)
if not os.path.exists(offset_filename):
os.makedirs(os.path.dirname(offset_filename))
with open(offset_filename, 'w') as offset_fd:
offset_fd.write(str(0))
with open(offset_filename, 'r+') as offset_fd:
log_fd.seek(int(offset_fd.readline()) or 0)
new_logrows_handler(log_fd.readlines())
offset_fd.seek(0)
offset_fd.write(str(log_fd.tell()))

