使用 Pandas 读取日志文件

Question

提问by datascana

I have a log file that I tried to read in pandas with read_csv or read_table. I've got this example of results:

我有一个日志文件，我试图用 read_csv 或 read_table 在 Pandas 中读取它。我有这个结果示例：

0    date=2015-09-17    time=21:05:35     duration=0    etc...

on 1 column.

在 1 列上。

I would like to split each row, take the names (like date, time, ...) and convert them to columns so I would get:

我想拆分每一行，取名称（如日期、时间等）并将它们转换为列，这样我就会得到：

          date           time     duration   ...
  0    2015-09-17      21:05:35      0

Thank you !

谢谢！

Answer 1

回答by Ricky Kim

I know this is an old post, but I came across this same problem and found a solution. The error Expected n fields in line n, saw nis probably due to each row having different number of columns. This method is also not good if the ordering of columns are different for each row. I wrote a sample code here which converts your log into json and then to pandas Dataframe.

我知道这是一个旧帖子，但我遇到了同样的问题并找到了解决方案。该错误Expected n fields in line n, saw n可能是由于每行具有不同的列数。如果每行的列顺序不同，则此方法也不好。我在这里写了一个示例代码，它将您的日志转换为 json，然后转换为 Pandas Dataframe。

import pandas as pd
import json

path='log_sample.log'

log_data=open(path,'r')
result={}
i=0
for line in log_data:
    columns = line.split('') #or w/e you're delimiter/separator is
    data={}
    for c in columns:
        key = c.split('=')[0]
        value=c.split('=')[1]
        data[key]=value
    result[i]=data
    i+=1
j=json.dumps(result)

df=pd.read_json(j, orient='index')

Answer 2

回答by datadavis2

----- Editing answer to account for inconsistent spacing:

-----编辑答案以解决不一致的间距：

Not sure what the pythonic approach should be, but here's a method that could work.

不确定 pythonic 方法应该是什么，但这里有一个可行的方法。

Using OP's data sample as an example:

以OP的数据样本为例：

0    date=2015-09-17    time=21:05:35     duration=0
1    date=2015-09-17    time=21:05:36     duration=0
2    date=2015-09-17    time=21:05:37     duration=0
3    date=2015-09-17    time=21:05:38     duration=0
4    date=2015-09-17    time=21:05:39     duration=0
5    date=2015-09-17    time=21:05:40     duration=0

I loop through each line and split at the equals sign, then grab the desired text:

我遍历每一行并在等号处拆分，然后获取所需的文本：

import pandas as pd

log_data  = open('log_sample.txt', 'r')
split_list = []

for line in log_data:
    thing1 = line.split('=')
    #print(thing1)
    date = thing1[1][:10]
    time = thing1[2][:8]
    dur = thing1[3]

    split_list.append([date, time, dur])

df = pd.DataFrame(split_list, columns=['date', 'time', 'duration'])
df

----- First Answer:

-----第一个答案：

As @jezrael mentions in the comments, you can leverage the "sep" argument within read_csv.

正如@jezrael 在评论中提到的，您可以利用 read_csv 中的“sep”参数。

pd.read_csv('test.txt', sep=r'\t', engine='python') #[1]

See:

看：

使用 Pandas 读取日志文件

提问by datascana

回答by Ricky Kim

回答by datadavis2

相关推荐

最近更新

标签

使用 Pandas 读取日志文件

提问by datascana

回答by Ricky Kim

回答by datadavis2

相关推荐

pandas 在 matplotlib 中根据数字变量绘制分类变量

pandas 熊猫数据框列上的子字符串

pandas pd.read_csv 有没有办法用其他字符替换 NaN 值？

pandas python中的地理编码使用API​​密钥从地址获取纬度和经度

相关推荐

最近更新

标签

pandas python中的地理编码使用API密钥从地址获取纬度和经度