通过从每行的不同列中选择一个元素，从 Pandas DataFrame 创建一个系列

Question

提问by Brian

My goal is to create a Series from a Pandas DataFrame by choosing an element from different columns on each row.

我的目标是通过从每行的不同列中选择一个元素，从 Pandas DataFrame 创建一个系列。

For example, I have the following DataFrame:

例如，我有以下数据帧：

In [171]: pred[:10]
Out[171]: 
                     0  1  2
Timestamp                   
2010-12-21 00:00:00  0  0  1
2010-12-20 00:00:00  1  1  1
2010-12-17 00:00:00  1  1  1
2010-12-16 00:00:00  0  0  1
2010-12-15 00:00:00  1  1  1
2010-12-14 00:00:00  1  1  1
2010-12-13 00:00:00  0  0  1
2010-12-10 00:00:00  1  1  1
2010-12-09 00:00:00  1  1  1
2010-12-08 00:00:00  0  0  1

And, I have the following series:

而且，我有以下系列：

In [172]: useProb[:10]
Out[172]: 
Timestamp
2010-12-21 00:00:00    1
2010-12-20 00:00:00    2
2010-12-17 00:00:00    1
2010-12-16 00:00:00    2
2010-12-15 00:00:00    2
2010-12-14 00:00:00    2
2010-12-13 00:00:00    0
2010-12-10 00:00:00    2
2010-12-09 00:00:00    2
2010-12-08 00:00:00    0

I would like to create a new series, usePred, that takes the values from pred, based on the column information in useProb to return the following:

我想创建一个新系列 usePred，它根据 useProb 中的列信息从 pred 中获取值以返回以下内容：

In [172]: usePred[:10]
Out[172]: 
Timestamp
2010-12-21 00:00:00    0
2010-12-20 00:00:00    1
2010-12-17 00:00:00    1
2010-12-16 00:00:00    1
2010-12-15 00:00:00    1
2010-12-14 00:00:00    1
2010-12-13 00:00:00    0
2010-12-10 00:00:00    1
2010-12-09 00:00:00    1
2010-12-08 00:00:00    0

This last step is where I fail. I've tried things like:

这最后一步是我失败的地方。我试过这样的事情：

usePred = pd.DataFrame(index = pred.index)
for row in usePred:
    usePred['PREDS'].ix[row] = pred.ix[row, useProb[row]]

And, I've tried:

而且，我试过：

usePred['PREDS'] = pred.iloc[:,useProb]

I google'd and search on stackoverflow, for hours, but can't seem to solve the problem.

我用谷歌搜索并在 stackoverflow 上搜索了几个小时，但似乎无法解决问题。

Answer 1

回答by Andy Hayden

One solution could be to use get dummies(which shouldbe more efficient that apply):

一种解决方案可能是使用get dummys（这应该更有效）：

In [11]: (pd.get_dummies(useProb) * pred).sum(axis=1)
Out[11]:
Timestamp
2010-12-21 00:00:00    0
2010-12-20 00:00:00    1
2010-12-17 00:00:00    1
2010-12-16 00:00:00    1
2010-12-15 00:00:00    1
2010-12-14 00:00:00    1
2010-12-13 00:00:00    0
2010-12-10 00:00:00    1
2010-12-09 00:00:00    1
2010-12-08 00:00:00    0
dtype: float64

You could use an apply with a couple of locs:

您可以使用带有几个 locs 的应用程序：

In [21]: pred.apply(lambda row: row.loc[useProb.loc[row.name]], axis=1)
Out[21]:
Timestamp
2010-12-21 00:00:00    0
2010-12-20 00:00:00    1
2010-12-17 00:00:00    1
2010-12-16 00:00:00    1
2010-12-15 00:00:00    1
2010-12-14 00:00:00    1
2010-12-13 00:00:00    0
2010-12-10 00:00:00    1
2010-12-09 00:00:00    1
2010-12-08 00:00:00    0
dtype: int64

The trick being that you have access to the rows index via the name property.

诀窍是您可以通过 name 属性访问行索引。

Answer 2

回答by unutbu

Here is another way to do it using DataFrame.lookup:

这是使用DataFrame.lookup 的另一种方法：

pred.lookup(row_labels=pred.index, 
            col_labels=pred.columns[useProb['0']])

It seems to be exactly what you need, except that care must be taken to supply values which are labels. For example, if pred.columnsare strings, and useProb['0']values are integers, then we could use

它似乎正是您所需要的，只是必须注意提供标签值。例如，如果pred.columns是字符串，而useProb['0']值是整数，那么我们可以使用

pred.columns[useProb['0']]

so that the values passed to the col_labelsparameter are proper label values.

以便传递给col_labels参数的值是正确的标签值。

For example,

例如，

import io
import pandas as pd
content = io.BytesIO('''\
Timestamp  0  1  2
2010-12-21 00:00:00  0  0  1
2010-12-20 00:00:00  1  1  1
2010-12-17 00:00:00  1  1  1
2010-12-16 00:00:00  0  0  1
2010-12-15 00:00:00  1  1  1
2010-12-14 00:00:00  1  1  1
2010-12-13 00:00:00  0  0  1
2010-12-10 00:00:00  1  1  1
2010-12-09 00:00:00  1  1  1
2010-12-08 00:00:00  0  0  1''')
pred = pd.read_table(content, sep='\s{2,}', parse_dates=True, index_col=[0])

content = io.BytesIO('''\
Timestamp  0
2010-12-21 00:00:00    1
2010-12-20 00:00:00    2
2010-12-17 00:00:00    1
2010-12-16 00:00:00    2
2010-12-15 00:00:00    2
2010-12-14 00:00:00    2
2010-12-13 00:00:00    0
2010-12-10 00:00:00    2
2010-12-09 00:00:00    2
2010-12-08 00:00:00    0''')
useProb = pd.read_table(content, sep='\s{2,}', parse_dates=True, index_col=[0])
print(pd.Series(pred.lookup(row_labels=pred.index, 
                col_labels=pred.columns[useProb['0']]),
                index=pred.index))

yields

产量

    Timestamp
2010-12-21    0
2010-12-20    1
2010-12-17    1
2010-12-16    1
2010-12-15    1
2010-12-14    1
2010-12-13    0
2010-12-10    1
2010-12-09    1
2010-12-08    0
dtype: int64

通过从每行的不同列中选择一个元素，从 Pandas DataFrame 创建一个系列

提问by Brian

回答by Andy Hayden

回答by unutbu

相关推荐

最近更新

标签

通过从每行的不同列中选择一个元素，从 Pandas DataFrame 创建一个系列

提问by Brian

回答by Andy Hayden

回答by unutbu

相关推荐

Pandas fillna：输出仍然有 NaN 值

在 Pandas DF 行中查找最小日期并创建新列

pandas 将函数应用于熊猫数据框中的组

Pandas drop 函数：不可对齐的布尔系列

相关推荐

最近更新

标签