使用 Pandas 读取日志文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47452552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:48:54  来源:igfitidea点击:

Read log file with pandas

pythondatabasepandasdataframelogging

提问by datascana

I have a log file that I tried to read in pandas with read_csv or read_table. I've got this example of results:

我有一个日志文件,我试图用 read_csv 或 read_table 在 Pandas 中读取它。我有这个结果示例:

0    date=2015-09-17    time=21:05:35     duration=0    etc...

on 1 column.

在 1 列上。

I would like to split each row, take the names (like date, time, ...) and convert them to columns so I would get:

我想拆分每一行,取名称(如日期、时间等)并将它们转换为列,这样我就会得到:

          date           time     duration   ...
  0    2015-09-17      21:05:35      0              

Thank you !

谢谢 !

回答by Ricky Kim

I know this is an old post, but I came across this same problem and found a solution. The error Expected n fields in line n, saw nis probably due to each row having different number of columns. This method is also not good if the ordering of columns are different for each row. I wrote a sample code here which converts your log into json and then to pandas Dataframe.

我知道这是一个旧帖子,但我遇到了同样的问题并找到了解决方案。该错误Expected n fields in line n, saw n可能是由于每行具有不同的列数。如果每行的列顺序不同,则此方法也不好。我在这里写了一个示例代码,它将您的日志转换为 json,然后转换为 Pandas Dataframe。

import pandas as pd
import json

path='log_sample.log'

log_data=open(path,'r')
result={}
i=0
for line in log_data:
    columns = line.split('') #or w/e you're delimiter/separator is
    data={}
    for c in columns:
        key = c.split('=')[0]
        value=c.split('=')[1]
        data[key]=value
    result[i]=data
    i+=1
j=json.dumps(result)

df=pd.read_json(j, orient='index')

回答by datadavis2

----- Editing answer to account for inconsistent spacing:

-----编辑答案以解决不一致的间距:

Not sure what the pythonic approach should be, but here's a method that could work.

不确定 pythonic 方法应该是什么,但这里有一个可行的方法。

Using OP's data sample as an example:

以OP的数据样本为例:

0    date=2015-09-17    time=21:05:35     duration=0
1    date=2015-09-17    time=21:05:36     duration=0
2    date=2015-09-17    time=21:05:37     duration=0
3    date=2015-09-17    time=21:05:38     duration=0
4    date=2015-09-17    time=21:05:39     duration=0
5    date=2015-09-17    time=21:05:40     duration=0

I loop through each line and split at the equals sign, then grab the desired text:

我遍历每一行并在等号处拆分,然后获取所需的文本:

import pandas as pd

log_data  = open('log_sample.txt', 'r')
split_list = []

for line in log_data:
    thing1 = line.split('=')
    #print(thing1)
    date = thing1[1][:10]
    time = thing1[2][:8]
    dur = thing1[3]

    split_list.append([date, time, dur])

df = pd.DataFrame(split_list, columns=['date', 'time', 'duration'])
df

----- First Answer:

-----第一个答案:

As @jezrael mentions in the comments, you can leverage the "sep" argument within read_csv.

正如@jezrael 在评论中提到的,您可以利用 read_csv 中的“sep”参数。

pd.read_csv('test.txt', sep=r'\t', engine='python') #[1]

See:

看: