pandas 将系列设置为索引

Question

提问by zthomas.nc

I'm using python 2.7 to take a numerical column of my dataframe dataand make it an individual object (series) with an index of dates that is another column from data.

我正在使用 python 2.7 来获取我的数据框的数字列，data并使其成为一个单独的对象（系列），其日期索引是data.

new_series = pd.Series(data['numerical_column'] , index=data['dates'])

However, when I do this, I get a bunch of NaNvalues in the Series:

但是，当我这样做时，我NaN在系列中得到了一堆值：

dates
1980-01-31   NaN
1980-02-29   NaN
1980-03-31   NaN
1980-04-30   NaN
1980-05-31   NaN
1980-06-30   NaN
...

Why does my numerical_datavalues just disappear?

为什么我的numerical_data价值观消失了？

I realize that I can apparently achieve this goal by doing the following, although I'm curious why my initial approach failed.

我意识到我显然可以通过执行以下操作来实现这个目标，尽管我很好奇为什么我最初的方法失败了。

new_series = data.set_index('dates')['numerical_column']

Answer 1

回答by jezrael

I think there is problem with not align index of column data['numerical_column'].

我认为不对齐 column 索引存在问题data['numerical_column']。

So need convert it to numpy arrayby values:

因此需要将其转换为numpy array通过values：

new_series = pd.Series(data['numerical_column'].values , index=data['dates'])

Sample:

样本：

import pandas as pd
import datetime

data = pd.DataFrame({
'dates': {0: datetime.date(1980, 1, 31), 1: datetime.date(1980, 2, 29), 
          2: datetime.date(1980, 3, 31), 3: datetime.date(1980, 4, 30), 
          4: datetime.date(1980, 5, 31), 5: datetime.date(1980, 6, 30)}, 
'numerical_column': {0: 1, 1: 4, 2: 5, 3: 3, 4: 1, 5: 0}})
print (data)
        dates  numerical_column
0  1980-01-31                 1
1  1980-02-29                 4
2  1980-03-31                 5
3  1980-04-30                 3
4  1980-05-31                 1
5  1980-06-30                 0

new_series = pd.Series(data['numerical_column'].values , index=data['dates'])
print (new_series)
dates
1980-01-31    1
1980-02-29    4
1980-03-31    5
1980-04-30    3
1980-05-31    1
1980-06-30    0
dtype: int64

But method with set_indexis nicer, but slowier:

但方法 withset_index更好，但更慢：

#[60000 rows x 2 columns]
data = pd.concat([data]*10000).reset_index(drop=True)

In [65]: %timeit pd.Series(data['numerical_column'].values , index=data['dates'])
1000 loops, best of 3: 308 μs per loop

In [66]: %timeit data.set_index('dates')['numerical_column']
1000 loops, best of 3: 1.28 ms per loop

Verification:

验证：

If index of column has same index, it works nice:

如果列的索引具有相同的索引，则效果很好：

s = data.set_index('dates')['numerical_column']
df = s.to_frame()
print (df)
            numerical_column
dates                       
1980-01-31                 1
1980-02-29                 4
1980-03-31                 5
1980-04-30                 3
1980-05-31                 1
1980-06-30                 0

new_series = pd.Series(df['numerical_column'] , index=data['dates'])
print (new_series)
dates
1980-01-31    1
1980-02-29    4
1980-03-31    5
1980-04-30    3
1980-05-31    1
1980-06-30    0
Name: numerical_column, dtype: int64

pandas 将系列设置为索引

提问by zthomas.nc

回答by jezrael

相关推荐

最近更新

标签

pandas 将系列设置为索引

提问by zthomas.nc

回答by jezrael

相关推荐

情节图例中的 Pandas groupby 对象

pandas Python：用户警告：此模式具有匹配组。要实际获取组，请使用 str.extract

Pandas 将所有对象列转换为类别

pandas 越界纳秒时间戳

相关推荐

最近更新

标签