Pandas - 如何将 RangeIndex 转换为 DateTimeIndex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48248239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:03:13  来源:igfitidea点击:

Pandas - how to convert RangeIndex into DateTimeIndex

pythonpandasindexingtime-series

提问by rioZg

I have the following dataframe. It is OHLC one-minute data. Obviously I need the T column to become and index in order to use time-series functionallity

我有以下数据框。它是 OHLC 一分钟数据。显然我需要 T 列成为和索引才能使用时间序列功能

C H L O T V

CHLOTV

13712 6873.0 6873.0 6873.0 6873.0 2018-01-13T17:17:00 799.448421 
13713 6878.0 6878.0 6875.0 6875.0 2018-01-13T17:18:00 1707.578666 
13714 6880.0 6880.0 6825.0 6825.0 2018-01-13T17:21:00 481.245707 
13715 6876.0 6876.0 6876.0 6876.0 2018-01-13T17:22:00 839.177283 
13716 6870.0 6878.0 6830.0 6878.0 2018-01-13T17:23:00 4336.830277 

I used:

我用了:

df['T'] = pd.to_datetime(df['T'])

So far so good! The T column is now recognised as a date

到现在为止还挺好!T 列现在被识别为日期

Check:

查看:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13717 entries, 1970-01-01 00:00:00 to 1970-01-01 00:00:00.000013716
Data columns (total 7 columns):
BV    13717 non-null float64
C     13717 non-null float64
H     13717 non-null float64
L     13717 non-null float64
O     13717 non-null float64
T     13717 non-null datetime64[ns]
V     13717 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 857.3 KB

And now comes the fun and unexplainable part:

现在是有趣且无法解释的部分:

df.set_index(df['T'])


   C H L O T V
T

2018-01-03 17:27:00 5710.0 5710.0 5663.0 5667.0 2018-01-03 17:27:00 3863.030204 
2018-01-03 17:28:00 5704.0 5710.0 5663.0 5710.0 2018-01-03 17:28:00 1208.627542 
2018-01-03 17:29:00 5699.0 5699.0 5663.0 5663.0 2018-01-03 17:29:00 1755.123688 

Still looks good, but when I check the type of index I get:

看起来仍然不错,但是当我检查索引类型时,我得到:

RangeIndex(start=0, stop=13717, step=1)

And now if I try:

现在,如果我尝试:

df.index = pd.to_datetime(df.index)

I end up with:

我最终得到:

DatetimeIndex([          '1970-01-01 00:00:00',
               '1970-01-01 00:00:00.000000001',
               '1970-01-01 00:00:00.000000002',
               '1970-01-01 00:00:00.000000003',
               '1970-01-01 00:00:00.000000004' and so on...

which is evidently wrong.

这显然是错误的。

The questions are: 1. Why don't I get the normal DateTimeIndex if I am converting a date to index?

问题是: 1. 如果我将日期转换为索引,为什么我得不到正常的 DateTimeIndex?

  1. How can I convert that RangeIndex to DateTimeIndex with correct timestamps?
  1. 如何使用正确的时间戳将该 RangeIndex 转换为 DateTimeIndex?

Thanks!

谢谢!

回答by jezrael

If input data are csvthe simpliest is use parameters parse_datesand index_colin read_csv:

如果输入数据是csv最简单的使用参数parse_datesindex_colin read_csv

df = pd.read_csv(file, parse_dates=['T'], index_col=['T'])

If not, then use your solution, don't forget assign back output of set_indexand if need drop column Talso after DatetimeIndexuse Tinstead df['T']:

如果没有,则使用您的解决方案,不要忘记分配后出set_index,如果需要删除列T后也DatetimeIndex使用T,而不是df['T']

df['T'] = pd.to_datetime('T')
df = df.set_index('T')

#alternative solution
#df.set_index('T', inplace=True)

Why don't I get the normal DateTimeIndex if I am converting a date to index?

如果我将日期转换为索引,为什么我得不到正常的 DateTimeIndex?

Because your index is default (0,1,2..), so df.index = pd.to_datetime(df.index)parse integerss like nsand get weird datetimes.

因为您的索引是默认的 ( 0,1,2..),所以df.index = pd.to_datetime(df.index)解析integerssns并获得奇怪的日期时间。