从 Pandas DataFrame 创建时间序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43708875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating Time Series from Pandas DataFrame
提问by Renée
I have a dataframe with various attributes, including one datetime column. I want to extract one of the attribute columns as a time series indexed by the datetime column. This seemed pretty straightforward, and I can construct time series with random values, as all the pandas docs show.. but when I do so from a dataframe, my attribute values all convert to NaN.
我有一个具有各种属性的数据框,包括一个日期时间列。我想提取属性列之一作为由日期时间列索引的时间序列。这看起来很简单,我可以用随机值构建时间序列,正如所有 Pandas 文档所显示的那样......但是当我从数据框中这样做时,我的属性值都转换为 NaN。
Here's an analogous example.
这是一个类似的例子。
df = pd.DataFrame({'a': [0,1], 'date':[pd.to_datetime('2017-04-01'),
pd.to_datetime('2017-04-02')]})
s = pd.Series(df.a, index=df.date)
In this case, the series will have correct time series index, but all the values will be NaN.
在这种情况下,该系列将具有正确的时间序列索引,但所有值都将为 NaN。
I can do the series in two steps, as below, but I don't understand why this should be required.
我可以分两步完成这个系列,如下所示,但我不明白为什么需要这样做。
s = pd.Series(df.a)
s.index = df.date
What am I missing? I assume it has to do with series references, but don't understand at all why the values would go to NaN.
我错过了什么?我认为它与系列引用有关,但根本不明白为什么这些值会变为 NaN。
I am also able to get it to work by copying the index column.
我也可以通过复制索引列来让它工作。
s = pd.Series(df.a, df.date.copy())
回答by Craig
The problem is that pd.Series()
is trying to use the values specified in index
to select values from the dataframe, but the date values in the dataframe are not present in the index.
问题是pd.Series()
试图使用指定的index
值从数据框中选择值,但数据框中的日期值不存在于索引中。
You can set the index to the date column and then select the one data column you want. This will return a series with the dates as the index
您可以将索引设置为日期列,然后选择您想要的一个数据列。这将返回一个以日期为索引的系列
import pandas as pd
df = pd.DataFrame({'a': [0,1], 'date':[pd.to_datetime('2017-04-01'),
pd.to_datetime('2017-04-02')]})
s = df.set_index('date')['a']
Examining s
gives:
检查s
给出:
In [1]: s
Out[1]:
date
2017-04-01 0
2017-04-02 1
Name: a, dtype: int64
And you can confirm that s
is a Series
:
你可以确认这s
是一个Series
:
In [2]: isinstance(s, pd.Series)
Out[2]: True