将文件记录到 Pandas Dataframe
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40305122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Log file to Pandas Dataframe
提问by ukbaz
I have log files, which have many lines in the form of :
我有日志文件,其中有很多行,形式为:
LogLevel [13/10/2015 00:30:00.650] [Message Text]
My goal is to convert each line in the log file into a nice Data frame. I have tired to do that, by splitting the lines on the [ character, however I am still not getting a neat dataframe.
我的目标是将日志文件中的每一行转换为一个不错的数据框。我已经厌倦了这样做,通过拆分 [ 字符上的行,但是我仍然没有得到一个整洁的数据框。
My code:
我的代码:
level = []
time = []
text = []
with open(filename) as inf:
for line in inf:
parts = line.split('[')
if len(parts) > 1:
level = parts[0]
time = parts[1]
text = parts[2]
print (parts[0],parts[1],parts[2])
s1 = pd.Series({'Level':level, 'Time': time, 'Text':text})
df = pd.DataFrame(s1).reset_index()
Heres my printed Data frame:
这是我打印的数据框:
Info 10/08/16 10:56:09.843] In Function CCatalinaPrinter::ItemDescription()]
Info 10/08/16 10:56:09.843] Sending UPC Description Message ]
How can I improve this to strip the whitespace and the other ']' character
我该如何改进以去除空格和另一个 ']' 字符
Thank you
谢谢
回答by jezrael
You can use read_csv
with separator \s*\[
- whitespaces with [
:
您可以使用read_csv
分隔符\s*\[
- 空格与[
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]
LogLevel [13/10/2015 00:30:00.650] [Message Text]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep="\s*\[", names=['Level','Time','Text'], engine='python')
Then remove ]
by strip
and convert column Time
to_datetime
:
然后取出]
用strip
和转换列Time
to_datetime
:
df.Time = pd.to_datetime(df.Time.str.strip(']'), format='%d/%m/%Y %H:%M:%S.%f')
df.Text = df.Text.str.strip(']')
print (df)
Level Time Text
0 LogLevel 2015-10-13 00:30:00.650 Message Text
1 LogLevel 2015-10-13 00:30:00.650 Message Text
2 LogLevel 2015-10-13 00:30:00.650 Message Text
3 LogLevel 2015-10-13 00:30:00.650 Message Text
print (df.dtypes)
Level object
Time datetime64[ns]
Text object
dtype: object
回答by jxramos
I had to parse mine manually since my separator showed up in my message body and the message body would span multiple lines as well, eg if an exception were thrown from my Flask application and the stack track recorded.
我不得不手动解析我的分隔符,因为我的分隔符出现在我的消息正文中,并且消息正文也会跨越多行,例如,如果我的 Flask 应用程序抛出异常并且记录了堆栈轨道。
Here's my log creation format...
这是我的日志创建格式...
logging.basicConfig( filename="%s/%s_MyApp.log" % ( Utilities.logFolder , datetime.datetime.today().strftime("%Y%m%d-%H%M%S")) , level=logging.DEBUG, format="%(asctime)s,%(name)s,%(process)s,%(levelno)u,%(message)s", datefmt="%Y-%m-%d %H:%M:%S" )
And the parsing code in my Utilities module
以及我的 Utilities 模块中的解析代码
Utilities.py
import re
import pandas
logFolder = "./Logs"
logLevelToString = { "50" : "CRITICAL",
"40" : "ERROR" ,
"30" : "WARNING" ,
"20" : "INFO" ,
"10" : "DEBUG" ,
"0" : "NOTSET" } # https://docs.python.org/3.6/library/logging.html#logging-levels
def logFile2DataFrame( filePath ) :
dfLog = pandas.DataFrame( columns=[ 'Timestamp' , 'Module' , 'ProcessID' , 'Level' , 'Message' ] )
tsPattern = "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},"
with open( filePath , 'r' ) as logFile :
numRows = -1
for line in logFile :
if re.search( tsPattern , line ) :
tokens = line.split(",")
timestamp = tokens[0]
module = tokens[1]
processID = tokens[2]
level = logLevelToString[ tokens[3] ]
message = ",".join( tokens[4:] )
numRows += 1
dfLog.loc[ numRows ] = [ timestamp , module , processID , level , message ]
else :
# Multiline message, integrate it into last record
dfLog.loc[ numRows , 'Message' ] += line
return dfLog
I actually created this helper message to allow me to view my logs directly from my Flask app as I have a handy template that renders a DataFrame. Should accelerate debugging a bunch since encasing the flaskapp in a Tornado WSGI server prevents the display of the normal debug page visible from Flask when an exception gets thrown. If anyone knows how to restore that functionality in such a usage please share.
我实际上创建了这个帮助消息,让我可以直接从我的 Flask 应用程序查看我的日志,因为我有一个方便的模板来呈现一个 DataFrame。应该加速调试,因为将flaskapp 封装在Tornado WSGI 服务器中会阻止在抛出异常时显示从Flask 可见的正常调试页面。如果有人知道如何在这种用法中恢复该功能,请分享。