Python 使用熊猫读取带有时间戳列的 csv

Question

提问by Basj

When doing:

做的时候：

import pandas
x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', 
                                names=['DateTime', 'X'], header=None, sep=';')

with this data.csvfile:

使用此data.csv文件：

1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12

(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S'):

（第一列是 UNIX 时间戳，即自 1970 年 1 月 1 日以来经过的秒数），每 15 秒重新采样数据时出现此错误x.resample('15S')：

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex

It's like the "datetime" information has not been parsed:

就像“日期时间”信息还没有被解析：

                 X
DateTime      
1.449054e+09  15.31                
1.449054e+09  16.19
...

How to import a .CSV with date stored as timestamp with pandas module?

如何使用 Pandas 模块导入日期存储为时间戳的 .CSV？

Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18?

然后，一旦我能够导入 CSV，如何访问日期 > 2015-12-02 12:02:18 的行？

Answer 1

采纳答案by Budo Zindovic

My solution was similar to Mike's:

我的解决方案类似于迈克的：

import pandas
import datetime
def dateparse (time_in_secs):    
    return datetime.datetime.fromtimestamp(float(time_in_secs))

x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)

out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))

Answer 2

回答by Mike Müller

You can parse the date yourself:

您可以自己解析日期：

import time
import pandas as pd

def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]

df = pd.read_csv('data.csv', parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)

The result:

结果：

>>> df
                        X
DateTime                  
2015-12-02 12:02:16  15.31
2015-12-02 12:02:17  16.19
2015-12-02 12:02:18  19.22
2015-12-02 12:02:18  15.12
2015-12-02 12:02:19  13.12

Answer 3

回答by EdChum

Use to_datetimeand pass unit='s'to parse the units as unix timestamps, this will be much faster:

使用to_datetime和 passunit='s'将单位解析为 unix 时间戳，这会快得多：

In [7]:
pd.to_datetime(df.index, unit='s')

Out[7]:
DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',
               '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',
               '2015-12-02 11:02:19.250000'],
              dtype='datetime64[ns]', name=0, freq=None)

Timings:

时间：

In [9]:

import time
%%timeit
import time
def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]
?
df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)
100 loops, best of 3: 4.07 ms per loop

and

和

In [12]:
%%timeit
t="""1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12"""
df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0])
df.index = pd.to_datetime(df.index, unit='s')
100 loops, best of 3: 1.69 ms per loop

So using to_datetimeis over 2x faster on this small dataset, I expect this to scale much better than the other methods

所以to_datetime在这个小数据集上使用速度快 2 倍以上，我希望这比其他方法好得多

Python 使用熊猫读取带有时间戳列的 csv

提问by Basj

采纳答案by Budo Zindovic

回答by Mike Müller

回答by EdChum

相关推荐

最近更新

标签

Python 使用熊猫读取带有时间戳列的 csv

提问by Basj

采纳答案by Budo Zindovic

回答by Mike Müller

回答by EdChum

相关推荐

在python中覆盖类变量？

Python 在 Flask 中禁用缓存

如何在python中将文本文件拆分为其单词？

Python 在 Anaconda 中安装 Plotly

相关推荐

最近更新

标签