Python 基于值的 Pandas 查找

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25935431/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:50:18  来源:igfitidea点击:

Pandas lookup based on value

pythonpandasnumpydataframe

提问by AtotheSiv

I have the following DataFrame:

我有以下数据帧:

Date    best    a    b    c    d
1990    a       5    4    7    2
1991    c       10   1    2    0
1992    d       2    1    4    12
1993    a       5    8    11   6

I would like to make a dataframe as follows:

我想制作一个数据框,如下所示:

Date    best    value   
1990    a       5
1991    c       2
1992    d       12
1993    a       5

So I am looking to find a value based on another row value by using column names. For instance, the value for 1990 in the second df should lookup "a" from the first df and the second row should lookup "c" (=2) from the first df.

所以我希望通过使用列名来查找基于另一个行值的值。例如,第二个 df 中的 1990 值应该从第一个 df 中查找“a”,第二行应该从第一个 df 中查找“c”(= 2)。

Any ideas?

有任何想法吗?

回答by EdChum

You create a lookup function and call applyon your dataframe row-wise, this isn't very efficient for large dfs though

您创建了一个查找函数并apply逐行调用您的数据框,但这对于大型 dfs 不是很有效

In [245]:

def lookup(x):
    return x[x.best]
df['value'] = df.apply(lambda row: lookup(row), axis=1)
df
Out[245]:
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5

回答by chrisb

There is a built in lookupfunction that can handle this type of situation (looks up by row/column). I don't know how optimized it is, but may be faster than the apply solution.

有一个内置lookup函数可以处理这种情况(按行/列查找)。我不知道它是如何优化的,但可能比应用解决方案更快。

In [9]: df['value'] = df.lookup(df.index, df['best'])

In [10]: df
Out[10]: 
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5

回答by Dev Khadka

You can do this using np.wherelike below. I think it will be more efficient

您可以使用np.where如下方法执行此操作。我认为它会更有效率

import numpy as np
import pandas as pd

df = pd.DataFrame([['1990', 'a', 5, 4, 7, 2], ['1991', 'c', 10, 1, 2, 0], ['1992', 'd', 2, 1, 4, 12], ['1993', 'a', 5, 8, 11, 6]], columns=('Date', 'best', 'a', 'b', 'c', 'd'))
arr = df.best.values

cols = df.columns[2:]
for col in cols:
    arr2 = df[col].values
    arr = np.where(arr==col, arr2, arr)

df.drop(columns=cols, inplace=True)
df["values"] = arr
df

Result

结果

Date    best    values
0   1990    a   5
1   1991    c   2
2   1992    d   12
3   1993    a   5