pandas 将系列设置为索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40029071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:12:29  来源:igfitidea点击:

Setting Series as index

pythonpython-2.7pandasdataframeseries

提问by zthomas.nc

I'm using python 2.7 to take a numerical column of my dataframe dataand make it an individual object (series) with an index of dates that is another column from data.

我正在使用 python 2.7 来获取我的数据框的数字列,data并使其成为一个单独的对象(系列),其日期索引是data.

new_series = pd.Series(data['numerical_column'] , index=data['dates'])

However, when I do this, I get a bunch of NaNvalues in the Series:

但是,当我这样做时,我NaN在系列中得到了一堆值:

dates
1980-01-31   NaN
1980-02-29   NaN
1980-03-31   NaN
1980-04-30   NaN
1980-05-31   NaN
1980-06-30   NaN
...

Why does my numerical_datavalues just disappear?

为什么我的numerical_data价值观消失了?

I realize that I can apparently achieve this goal by doing the following, although I'm curious why my initial approach failed.

我意识到我显然可以通过执行以下操作来实现这个目标,尽管我很好奇为什么我最初的方法失败了。

new_series = data.set_index('dates')['numerical_column']

回答by jezrael

I think there is problem with not align index of column data['numerical_column'].

我认为不对齐 column 索引存在问题data['numerical_column']

So need convert it to numpy arrayby values:

因此需要将其转换为numpy array通过values

new_series = pd.Series(data['numerical_column'].values , index=data['dates'])

Sample:

样本:

import pandas as pd
import datetime

data = pd.DataFrame({
'dates': {0: datetime.date(1980, 1, 31), 1: datetime.date(1980, 2, 29), 
          2: datetime.date(1980, 3, 31), 3: datetime.date(1980, 4, 30), 
          4: datetime.date(1980, 5, 31), 5: datetime.date(1980, 6, 30)}, 
'numerical_column': {0: 1, 1: 4, 2: 5, 3: 3, 4: 1, 5: 0}})
print (data)
        dates  numerical_column
0  1980-01-31                 1
1  1980-02-29                 4
2  1980-03-31                 5
3  1980-04-30                 3
4  1980-05-31                 1
5  1980-06-30                 0

new_series = pd.Series(data['numerical_column'].values , index=data['dates'])
print (new_series)
dates
1980-01-31    1
1980-02-29    4
1980-03-31    5
1980-04-30    3
1980-05-31    1
1980-06-30    0
dtype: int64

But method with set_indexis nicer, but slowier:

但方法 withset_index更好,但更慢:

#[60000 rows x 2 columns]
data = pd.concat([data]*10000).reset_index(drop=True)

In [65]: %timeit pd.Series(data['numerical_column'].values , index=data['dates'])
1000 loops, best of 3: 308 μs per loop

In [66]: %timeit data.set_index('dates')['numerical_column']
1000 loops, best of 3: 1.28 ms per loop

Verification:

验证

If index of column has same index, it works nice:

如果列的索引具有相同的索引,则效果很好:

s = data.set_index('dates')['numerical_column']
df = s.to_frame()
print (df)
            numerical_column
dates                       
1980-01-31                 1
1980-02-29                 4
1980-03-31                 5
1980-04-30                 3
1980-05-31                 1
1980-06-30                 0

new_series = pd.Series(df['numerical_column'] , index=data['dates'])
print (new_series)
dates
1980-01-31    1
1980-02-29    4
1980-03-31    5
1980-04-30    3
1980-05-31    1
1980-06-30    0
Name: numerical_column, dtype: int64