Pandas 匹配多列并将匹配值作为单个新列获取

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48914328/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:13:19  来源:igfitidea点击:

Pandas match on multiple columns and get matching values as a single new column

pythonpandasdataframe

提问by Conquest

I have a dataframe with about 5 columns. The value I am looking to match could be present in either of the last 3 columns.

我有一个大约有 5 列的数据框。我希望匹配的值可能出现在最后 3 列中的任何一列中。

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
3        fgh       20        22      23
4        lmn       20        22      21

I am filtering on value 21on any of the last three columns as follows:

我正在过滤21最后三列中任何一列的值,如下所示:

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

which gives me

这给了我

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
4        lmn       20        22      21

Using this new df1 I want to get this

使用这个新的 df1 我想得到这个

Key   |  col1   |  newCol
-------------------------
1        abc       21      
2        cde       21      
4        lmn       21      

Basically any of the matched column as the new column value. How do I do this using pandas? I appreciate the help. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how?

基本上任何匹配的列都作为新的列值。我如何使用Pandas来做到这一点?我很感激你的帮助。所以我在想可能是我应该过滤并将它同时映射到新列,但我不知道如何?

采纳答案by jpp

Here is one way.

这是一种方法。

import pandas as pd, numpy as np

df = pd.DataFrame([[1, 'abc', 21, 22, 23],
                   [2, 'cde', 22, 21, 20],
                   [3, 'fgh', 20, 22, 23],
                   [4, 'lmn', 20, 22, 21]],
                  columns=['Key', 'col1', 'col2', 'col3', 'col4'])

df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
        .assign(newCol=21)\
        .drop(['col2', 'col3', 'col4'], 1)

#    Key col1  newCol
# 0    1  abc      21
# 1    2  cde      21
# 3    4  lmn      21

Explanation

解释

  • Store integers as integers rather than strings.
  • np.logical_or.reduceapplies your |condition across a list comprehension.
  • assigncreates a new column with the filter value.
  • dropremoves unwanted columns, axis=1refers to columns.
  • 将整数存储为整数而不是字符串。
  • np.logical_or.reduce|在列表理解中应用你的条件。
  • assign使用过滤器值创建一个新列。
  • drop删除不需要的列,axis=1指的是列。

回答by Zero

Use

In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), 
                 ['Key', 'col1']].assign(newcol=21)
Out[722]:
   Key col1  newcol
0    1  abc      21
1    2  cde      21
3    4  lmn      21


Details

细节

Equality check eqon necessary ['col2', 'col3', 'col4']columns

eq必要['col2', 'col3', 'col4']列的相等性检查

In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
    col2   col3   col4
0   True  False  False
1  False   True  False
2  False  False  False
3  False  False   True

anywould return whether any element is True in the row

any将返回行中是否有任何元素为 True

In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0     True
1     True
2    False
3     True
dtype: bool

Use .locto subset the matched rows and necessary ['Key', 'col1']columns.

用于.loc对匹配的行和必要的['Key', 'col1']列进行子集化。

In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
   Key col1
0    1  abc
1    2  cde
3    4  lmn

And, .assign(newcol=21)creates a newcolcolumn set to 21

并且,.assign(newcol=21)创建一个newcol列设置为21

回答by AdmiralWen

As jpp pointed out, you have 2 possibilities here: both 21 and 22 are common across all 3 columns. Assuming you don't know which one you're really looking for, what you can do is to use set()to isolate the unique values for each column, then use set.intersection()to find the commonalities:

正如 jpp 所指出的,这里有两种可能性:21 和 22 在所有 3 列中都是通用的。假设您不知道您真正要寻找的是哪一个,您可以做的是使用set()来隔离每一列的唯一值,然后使用set.intersection()来查找共性:

df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
                   {'col1':'b', 'col2':22, 'col3':21, 'col4':20},
                   {'col1':'c', 'col2':20, 'col3':22, 'col4':21},
                   {'col1':'d', 'col2':21, 'col3':21, 'col4':22}])

s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)

df['new_col'] = str(s1.intersection(s2, s3))
df

col1    col2    col3    col4    new_col
   a    21      22      23      {21, 22}
   b    22      21      20      {21, 22}
   c    20      22      21      {21, 22}
   d    21      21      22      {21, 22}