Python 基于值的 Pandas 查找
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25935431/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas lookup based on value
提问by AtotheSiv
I have the following DataFrame:
我有以下数据帧:
Date best a b c d
1990 a 5 4 7 2
1991 c 10 1 2 0
1992 d 2 1 4 12
1993 a 5 8 11 6
I would like to make a dataframe as follows:
我想制作一个数据框,如下所示:
Date best value
1990 a 5
1991 c 2
1992 d 12
1993 a 5
So I am looking to find a value based on another row value by using column names. For instance, the value for 1990 in the second df should lookup "a" from the first df and the second row should lookup "c" (=2) from the first df.
所以我希望通过使用列名来查找基于另一个行值的值。例如,第二个 df 中的 1990 值应该从第一个 df 中查找“a”,第二行应该从第一个 df 中查找“c”(= 2)。
Any ideas?
有任何想法吗?
回答by EdChum
You create a lookup function and call applyon your dataframe row-wise, this isn't very efficient for large dfs though
您创建了一个查找函数并apply逐行调用您的数据框,但这对于大型 dfs 不是很有效
In [245]:
def lookup(x):
return x[x.best]
df['value'] = df.apply(lambda row: lookup(row), axis=1)
df
Out[245]:
Date best a b c d value
0 1990 a 5 4 7 2 5
1 1991 c 10 1 2 0 2
2 1992 d 2 1 4 12 12
3 1993 a 5 8 11 6 5
回答by chrisb
There is a built in lookupfunction that can handle this type of situation (looks up by row/column). I don't know how optimized it is, but may be faster than the apply solution.
有一个内置lookup函数可以处理这种情况(按行/列查找)。我不知道它是如何优化的,但可能比应用解决方案更快。
In [9]: df['value'] = df.lookup(df.index, df['best'])
In [10]: df
Out[10]:
Date best a b c d value
0 1990 a 5 4 7 2 5
1 1991 c 10 1 2 0 2
2 1992 d 2 1 4 12 12
3 1993 a 5 8 11 6 5
回答by Dev Khadka
You can do this using np.wherelike below. I think it will be more efficient
您可以使用np.where如下方法执行此操作。我认为它会更有效率
import numpy as np
import pandas as pd
df = pd.DataFrame([['1990', 'a', 5, 4, 7, 2], ['1991', 'c', 10, 1, 2, 0], ['1992', 'd', 2, 1, 4, 12], ['1993', 'a', 5, 8, 11, 6]], columns=('Date', 'best', 'a', 'b', 'c', 'd'))
arr = df.best.values
cols = df.columns[2:]
for col in cols:
arr2 = df[col].values
arr = np.where(arr==col, arr2, arr)
df.drop(columns=cols, inplace=True)
df["values"] = arr
df
Result
结果
Date best values
0 1990 a 5
1 1991 c 2
2 1992 d 12
3 1993 a 5

