Python 基于值的 Pandas 查找

Question

提问by AtotheSiv

I have the following DataFrame:

我有以下数据帧：

Date    best    a    b    c    d
1990    a       5    4    7    2
1991    c       10   1    2    0
1992    d       2    1    4    12
1993    a       5    8    11   6

I would like to make a dataframe as follows:

我想制作一个数据框，如下所示：

Date    best    value   
1990    a       5
1991    c       2
1992    d       12
1993    a       5

So I am looking to find a value based on another row value by using column names. For instance, the value for 1990 in the second df should lookup "a" from the first df and the second row should lookup "c" (=2) from the first df.

所以我希望通过使用列名来查找基于另一个行值的值。例如，第二个 df 中的 1990 值应该从第一个 df 中查找“a”，第二行应该从第一个 df 中查找“c”（= 2）。

Any ideas?

有任何想法吗？

Answer 1

回答by EdChum

You create a lookup function and call applyon your dataframe row-wise, this isn't very efficient for large dfs though

您创建了一个查找函数并apply逐行调用您的数据框，但这对于大型 dfs 不是很有效

In [245]:

def lookup(x):
    return x[x.best]
df['value'] = df.apply(lambda row: lookup(row), axis=1)
df
Out[245]:
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5

Answer 2

回答by chrisb

There is a built in lookupfunction that can handle this type of situation (looks up by row/column). I don't know how optimized it is, but may be faster than the apply solution.

有一个内置lookup函数可以处理这种情况（按行/列查找）。我不知道它是如何优化的，但可能比应用解决方案更快。

In [9]: df['value'] = df.lookup(df.index, df['best'])

In [10]: df
Out[10]: 
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5

Answer 3

回答by Dev Khadka

You can do this using np.wherelike below. I think it will be more efficient

您可以使用np.where如下方法执行此操作。我认为它会更有效率

import numpy as np
import pandas as pd

df = pd.DataFrame([['1990', 'a', 5, 4, 7, 2], ['1991', 'c', 10, 1, 2, 0], ['1992', 'd', 2, 1, 4, 12], ['1993', 'a', 5, 8, 11, 6]], columns=('Date', 'best', 'a', 'b', 'c', 'd'))
arr = df.best.values

cols = df.columns[2:]
for col in cols:
    arr2 = df[col].values
    arr = np.where(arr==col, arr2, arr)

df.drop(columns=cols, inplace=True)
df["values"] = arr
df

Result

结果

Date    best    values
0   1990    a   5
1   1991    c   2
2   1992    d   12
3   1993    a   5

Python 基于值的 Pandas 查找

提问by AtotheSiv

回答by EdChum

回答by chrisb

回答by Dev Khadka

相关推荐

最近更新

标签

Python 基于值的 Pandas 查找

提问by AtotheSiv

回答by EdChum

回答by chrisb

回答by Dev Khadka

相关推荐

Python 熊猫：如果不存在则添加列

如何使用python计算SQL数据库中的记录数

Python 如何在 Windows 上安装 PyQt5？

在安装 Python 3.4 的同时为 Python 2.7 安装 numpy？

相关推荐

最近更新

标签