Python 使用熊猫读取带有时间戳列的 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34122395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:29:10  来源:igfitidea点击:

Reading a csv with a timestamp column, with pandas

pythoncsvpandas

提问by Basj

When doing:

做的时候:

import pandas
x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', 
                                names=['DateTime', 'X'], header=None, sep=';')

with this data.csvfile:

使用此data.csv文件:

1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12

(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S'):

(第一列是 UNIX 时间戳,即自 1970 年 1 月 1 日以来经过的秒数),每 15 秒重新采样数据时出现此错误x.resample('15S')

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex

It's like the "datetime" information has not been parsed:

就像“日期时间”信息还没有被解析:

                 X
DateTime      
1.449054e+09  15.31                
1.449054e+09  16.19
...

How to import a .CSV with date stored as timestamp with pandas module?

如何使用 Pandas 模块导入日期存储为时间戳的 .CSV?

Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18?

然后,一旦我能够导入 CSV,如何访问日期 > 2015-12-02 12:02:18 的行

采纳答案by Budo Zindovic

My solution was similar to Mike's:

我的解决方案类似于迈克的:

import pandas
import datetime
def dateparse (time_in_secs):    
    return datetime.datetime.fromtimestamp(float(time_in_secs))

x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)

out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))

回答by Mike Müller

You can parse the date yourself:

您可以自己解析日期:

import time
import pandas as pd

def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]

df = pd.read_csv('data.csv', parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)

The result:

结果:

>>> df
                        X
DateTime                  
2015-12-02 12:02:16  15.31
2015-12-02 12:02:17  16.19
2015-12-02 12:02:18  19.22
2015-12-02 12:02:18  15.12
2015-12-02 12:02:19  13.12

回答by EdChum

Use to_datetimeand pass unit='s'to parse the units as unix timestamps, this will be much faster:

使用to_datetime和 passunit='s'将单位解析为 unix 时间戳,这会快得多:

In [7]:
pd.to_datetime(df.index, unit='s')

Out[7]:
DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',
               '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',
               '2015-12-02 11:02:19.250000'],
              dtype='datetime64[ns]', name=0, freq=None)

Timings:

时间

In [9]:

import time
%%timeit
import time
def date_parser(string_list):
    return [time.ctime(float(x)) for x in string_list]
?
df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';', 
                 date_parser=date_parser, 
                 index_col='DateTime', 
                 names=['DateTime', 'X'], header=None)
100 loops, best of 3: 4.07 ms per loop

and

In [12]:
%%timeit
t="""1449054136.83;15.31
1449054137.43;16.19
1449054138.04;19.22
1449054138.65;15.12
1449054139.25;13.12"""
df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0])
df.index = pd.to_datetime(df.index, unit='s')
100 loops, best of 3: 1.69 ms per loop

So using to_datetimeis over 2x faster on this small dataset, I expect this to scale much better than the other methods

所以to_datetime在这个小数据集上使用速度快 2 倍以上,我希望这比其他方法好得多