Pandas 匹配多列并将匹配值作为单个新列获取
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48914328/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas match on multiple columns and get matching values as a single new column
提问by Conquest
I have a dataframe with about 5 columns. The value I am looking to match could be present in either of the last 3 columns.
我有一个大约有 5 列的数据框。我希望匹配的值可能出现在最后 3 列中的任何一列中。
Key | col1 | col2 | col3 | col4
----------------------------------------
1 abc 21 22 23
2 cde 22 21 20
3 fgh 20 22 23
4 lmn 20 22 21
I am filtering on value 21
on any of the last three columns as follows:
我正在过滤21
最后三列中任何一列的值,如下所示:
df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]
df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]
which gives me
这给了我
Key | col1 | col2 | col3 | col4
----------------------------------------
1 abc 21 22 23
2 cde 22 21 20
4 lmn 20 22 21
Using this new df1 I want to get this
使用这个新的 df1 我想得到这个
Key | col1 | newCol
-------------------------
1 abc 21
2 cde 21
4 lmn 21
Basically any of the matched column as the new column value. How do I do this using pandas? I appreciate the help. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how?
基本上任何匹配的列都作为新的列值。我如何使用Pandas来做到这一点?我很感激你的帮助。所以我在想可能是我应该过滤并将它同时映射到新列,但我不知道如何?
采纳答案by jpp
Here is one way.
这是一种方法。
import pandas as pd, numpy as np
df = pd.DataFrame([[1, 'abc', 21, 22, 23],
[2, 'cde', 22, 21, 20],
[3, 'fgh', 20, 22, 23],
[4, 'lmn', 20, 22, 21]],
columns=['Key', 'col1', 'col2', 'col3', 'col4'])
df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
.assign(newCol=21)\
.drop(['col2', 'col3', 'col4'], 1)
# Key col1 newCol
# 0 1 abc 21
# 1 2 cde 21
# 3 4 lmn 21
Explanation
解释
- Store integers as integers rather than strings.
np.logical_or.reduce
applies your|
condition across a list comprehension.assign
creates a new column with the filter value.drop
removes unwanted columns,axis=1
refers to columns.
- 将整数存储为整数而不是字符串。
np.logical_or.reduce
|
在列表理解中应用你的条件。assign
使用过滤器值创建一个新列。drop
删除不需要的列,axis=1
指的是列。
回答by Zero
Use
用
In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1),
['Key', 'col1']].assign(newcol=21)
Out[722]:
Key col1 newcol
0 1 abc 21
1 2 cde 21
3 4 lmn 21
Details
细节
Equality check eq
on necessary ['col2', 'col3', 'col4']
columns
eq
必要['col2', 'col3', 'col4']
列的相等性检查
In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
col2 col3 col4
0 True False False
1 False True False
2 False False False
3 False False True
any
would return whether any element is True in the row
any
将返回行中是否有任何元素为 True
In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0 True
1 True
2 False
3 True
dtype: bool
Use .loc
to subset the matched rows and necessary ['Key', 'col1']
columns.
用于.loc
对匹配的行和必要的['Key', 'col1']
列进行子集化。
In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
Key col1
0 1 abc
1 2 cde
3 4 lmn
And, .assign(newcol=21)
creates a newcol
column set to 21
并且,.assign(newcol=21)
创建一个newcol
列设置为21
回答by AdmiralWen
As jpp pointed out, you have 2 possibilities here: both 21 and 22 are common across all 3 columns. Assuming you don't know which one you're really looking for, what you can do is to use set()
to isolate the unique values for each column, then use set.intersection()
to find the commonalities:
正如 jpp 所指出的,这里有两种可能性:21 和 22 在所有 3 列中都是通用的。假设您不知道您真正要寻找的是哪一个,您可以做的是使用set()
来隔离每一列的唯一值,然后使用set.intersection()
来查找共性:
df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
{'col1':'b', 'col2':22, 'col3':21, 'col4':20},
{'col1':'c', 'col2':20, 'col3':22, 'col4':21},
{'col1':'d', 'col2':21, 'col3':21, 'col4':22}])
s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)
df['new_col'] = str(s1.intersection(s2, s3))
df
col1 col2 col3 col4 new_col
a 21 22 23 {21, 22}
b 22 21 20 {21, 22}
c 20 22 21 {21, 22}
d 21 21 22 {21, 22}