Pandas 匹配多列并将匹配值作为单个新列获取

Question

提问by Conquest

I have a dataframe with about 5 columns. The value I am looking to match could be present in either of the last 3 columns.

我有一个大约有 5 列的数据框。我希望匹配的值可能出现在最后 3 列中的任何一列中。

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
3        fgh       20        22      23
4        lmn       20        22      21

I am filtering on value 21on any of the last three columns as follows:

我正在过滤21最后三列中任何一列的值，如下所示：

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

which gives me

这给了我

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
4        lmn       20        22      21

Using this new df1 I want to get this

使用这个新的 df1 我想得到这个

Key   |  col1   |  newCol
-------------------------
1        abc       21      
2        cde       21      
4        lmn       21

Basically any of the matched column as the new column value. How do I do this using pandas? I appreciate the help. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how?

基本上任何匹配的列都作为新的列值。我如何使用Pandas来做到这一点？我很感激你的帮助。所以我在想可能是我应该过滤并将它同时映射到新列，但我不知道如何？

Answer 1

采纳答案by jpp

Here is one way.

这是一种方法。

import pandas as pd, numpy as np

df = pd.DataFrame([[1, 'abc', 21, 22, 23],
                   [2, 'cde', 22, 21, 20],
                   [3, 'fgh', 20, 22, 23],
                   [4, 'lmn', 20, 22, 21]],
                  columns=['Key', 'col1', 'col2', 'col3', 'col4'])

df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
        .assign(newCol=21)\
        .drop(['col2', 'col3', 'col4'], 1)

#    Key col1  newCol
# 0    1  abc      21
# 1    2  cde      21
# 3    4  lmn      21

Explanation

解释

Store integers as integers rather than strings.
np.logical_or.reduceapplies your |condition across a list comprehension.
assigncreates a new column with the filter value.
dropremoves unwanted columns, axis=1refers to columns.

将整数存储为整数而不是字符串。
np.logical_or.reduce|在列表理解中应用你的条件。
assign使用过滤器值创建一个新列。
drop删除不需要的列，axis=1指的是列。

Answer 2

回答by Zero

Use

用

In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), 
                 ['Key', 'col1']].assign(newcol=21)
Out[722]:
   Key col1  newcol
0    1  abc      21
1    2  cde      21
3    4  lmn      21

Details

细节

Equality check eqon necessary ['col2', 'col3', 'col4']columns

eq必要['col2', 'col3', 'col4']列的相等性检查

In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
    col2   col3   col4
0   True  False  False
1  False   True  False
2  False  False  False
3  False  False   True

anywould return whether any element is True in the row

any将返回行中是否有任何元素为 True

In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0     True
1     True
2    False
3     True
dtype: bool

Use .locto subset the matched rows and necessary ['Key', 'col1']columns.

用于.loc对匹配的行和必要的['Key', 'col1']列进行子集化。

In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
   Key col1
0    1  abc
1    2  cde
3    4  lmn

And, .assign(newcol=21)creates a newcolcolumn set to 21

并且，.assign(newcol=21)创建一个newcol列设置为21

Answer 3

回答by AdmiralWen

As jpp pointed out, you have 2 possibilities here: both 21 and 22 are common across all 3 columns. Assuming you don't know which one you're really looking for, what you can do is to use set()to isolate the unique values for each column, then use set.intersection()to find the commonalities:

正如 jpp 所指出的，这里有两种可能性：21 和 22 在所有 3 列中都是通用的。假设您不知道您真正要寻找的是哪一个，您可以做的是使用set()来隔离每一列的唯一值，然后使用set.intersection()来查找共性：

df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
                   {'col1':'b', 'col2':22, 'col3':21, 'col4':20},
                   {'col1':'c', 'col2':20, 'col3':22, 'col4':21},
                   {'col1':'d', 'col2':21, 'col3':21, 'col4':22}])

s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)

df['new_col'] = str(s1.intersection(s2, s3))
df

col1    col2    col3    col4    new_col
   a    21      22      23      {21, 22}
   b    22      21      20      {21, 22}
   c    20      22      21      {21, 22}
   d    21      21      22      {21, 22}

Pandas 匹配多列并将匹配值作为单个新列获取

提问by Conquest

采纳答案by jpp

回答by Zero

回答by AdmiralWen

相关推荐

最近更新

标签

Pandas 匹配多列并将匹配值作为单个新列获取

提问by Conquest

采纳答案by jpp

回答by Zero

回答by AdmiralWen

相关推荐

pandas 熊猫无法重置索引，因为名称存在

pandas 在python中对多列进行分组求和和计数

将 Pandas 数据帧保存到 pickle 和 csv 之间有什么区别？

如何在 Python pandas DataFrame 中对列值进行切片

相关推荐

最近更新

标签