使用 Pandas 读取日志文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47452552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read log file with pandas
提问by datascana
I have a log file that I tried to read in pandas with read_csv or read_table. I've got this example of results:
我有一个日志文件,我试图用 read_csv 或 read_table 在 Pandas 中读取它。我有这个结果示例:
0 date=2015-09-17 time=21:05:35 duration=0 etc...
on 1 column.
在 1 列上。
I would like to split each row, take the names (like date, time, ...) and convert them to columns so I would get:
我想拆分每一行,取名称(如日期、时间等)并将它们转换为列,这样我就会得到:
date time duration ...
0 2015-09-17 21:05:35 0
Thank you !
谢谢 !
回答by Ricky Kim
I know this is an old post, but I came across this same problem and found a solution. The error Expected n fields in line n, saw n
is probably due to each row having different number of columns. This method is also not good if the ordering of columns are different for each row. I wrote a sample code here which converts your log into json and then to pandas Dataframe.
我知道这是一个旧帖子,但我遇到了同样的问题并找到了解决方案。该错误Expected n fields in line n, saw n
可能是由于每行具有不同的列数。如果每行的列顺序不同,则此方法也不好。我在这里写了一个示例代码,它将您的日志转换为 json,然后转换为 Pandas Dataframe。
import pandas as pd
import json
path='log_sample.log'
log_data=open(path,'r')
result={}
i=0
for line in log_data:
columns = line.split('') #or w/e you're delimiter/separator is
data={}
for c in columns:
key = c.split('=')[0]
value=c.split('=')[1]
data[key]=value
result[i]=data
i+=1
j=json.dumps(result)
df=pd.read_json(j, orient='index')
回答by datadavis2
----- Editing answer to account for inconsistent spacing:
-----编辑答案以解决不一致的间距:
Not sure what the pythonic approach should be, but here's a method that could work.
不确定 pythonic 方法应该是什么,但这里有一个可行的方法。
Using OP's data sample as an example:
以OP的数据样本为例:
0 date=2015-09-17 time=21:05:35 duration=0
1 date=2015-09-17 time=21:05:36 duration=0
2 date=2015-09-17 time=21:05:37 duration=0
3 date=2015-09-17 time=21:05:38 duration=0
4 date=2015-09-17 time=21:05:39 duration=0
5 date=2015-09-17 time=21:05:40 duration=0
I loop through each line and split at the equals sign, then grab the desired text:
我遍历每一行并在等号处拆分,然后获取所需的文本:
import pandas as pd
log_data = open('log_sample.txt', 'r')
split_list = []
for line in log_data:
thing1 = line.split('=')
#print(thing1)
date = thing1[1][:10]
time = thing1[2][:8]
dur = thing1[3]
split_list.append([date, time, dur])
df = pd.DataFrame(split_list, columns=['date', 'time', 'duration'])
df
----- First Answer:
-----第一个答案:
As @jezrael mentions in the comments, you can leverage the "sep" argument within read_csv.
正如@jezrael 在评论中提到的,您可以利用 read_csv 中的“sep”参数。
pd.read_csv('test.txt', sep=r'\t', engine='python') #[1]
See:
看: