pandas 数据框列值与列表的比较

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44218378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:42:01  来源:igfitidea点击:

Comparison of a Dataframe column values with a list

pythonpandasdataframe

提问by Mehrdad Salimi

Consider this Dataframe:

考虑这个数据框:

df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
 'B': [10, 15, 20, 25, 30,35],
 'C': [100, 150, 200, 250, 300, 350]},)


A   B   C
1   10  100
1   15  150
2   20  200
2   25  250
3   30  300
3   35  350

I used this to get values of column C for first rows of each group:

我用它来获取每组第一行的 C 列值:

firsts = df.groupby('A').first()['C']

So first will be: (100, 200, 300).

所以首先将是:(100, 200, 300)

Now I want to add new column which it will be '1' if value of column C for row is in firstsotherwise it will be '0'.

现在我想添加新列,如果行的 C 列的值在,它将是 '1',firsts否则它将是 '0'。

   A    B   C   D
   1    10  100 1
   1    15  150 0
   2    20  200 1
   2    25  250 0
   3    30  300 1
   3    35  350 0

I used this:

我用过这个:

df['D'] = df['C'].apply(lambda x: 1 if x in firsts else 0)

But the output is:

但输出是:

   A    B   C   D
   1    10  100 0
   1    15  150 0
   2    20  200 0
   2    25  250 0
   3    30  300 0
   3    35  350 0

I appreciate if anyone explain why my solution is wrong and what is actual solution to this problem.

如果有人解释为什么我的解决方案是错误的以及这个问题的实际解决方案是什么,我很感激。

回答by Psidom

You can use isinmethod:

您可以使用isin方法:

df['D'] = df.C.isin(firsts).astype(int)

df
#   A   B   C   D
#0  1   10  100 1
#1  1   15  150 0
#2  2   20  200 1
#3  2   25  250 0
#4  3   30  300 1
#5  3   35  350 0


The reason your approach fails is that python inoperator check the index of a Series instead of the values, the same as how a dictionary works:

您的方法失败的原因是 pythonin运算符检查系列的索引而不是值,与字典的工作方式相同:

firsts
#A
#1    100
#2    200
#3    300
#Name: C, dtype: int64

1 in firsts
# True

100 in firsts
# False

2 in firsts
# True

200 in firsts
# False

Modifying your method as follows works:

修改您的方法如下有效:

firstSet = set(firsts)
df['C'].apply(lambda x: 1 if x in firstSet else 0)

#0    1
#1    0
#2    1
#3    0
#4    1
#5    0
#Name: C, dtype: int64

回答by MaxU

You can also do it in one step using .transform('first'):

您也可以使用.transform('first')以下步骤一步完成:

In [280]: df['D'] = df.groupby('A')['C'].transform('first').eq(df['C']).astype(np.int8)

In [281]: df
Out[281]:
   A   B    C  D
0  1  10  100  1
1  1  15  150  0
2  2  20  200  1
3  2  25  250  0
4  3  30  300  1
5  3  35  350  0

Explanation:GroupBy.transform('func')returns us a vector with the same length as the original DF with applied func

说明:向GroupBy.transform('func')我们返回一个与应用了原始 DF 的长度相同的向量func

In [14]: df.groupby('A')['C'].transform('first')
Out[14]:
0    100
1    100
2    200
3    200
4    300
5    300
Name: C, dtype: int64

In [15]: df.groupby('A')['C'].transform('max')
Out[15]:
0    150
1    150
2    250
3    250
4    350
5    350
Name: C, dtype: int64

In [16]: df.groupby('A')['C'].transform('min')
Out[16]:
0    100
1    100
2    200
3    200
4    300
5    300
Name: C, dtype: int64

In [17]: df.groupby('A')['C'].transform('mean')
Out[17]:
0    125
1    125
2    225
3    225
4    325
5    325
Name: C, dtype: int64

In [18]: df.groupby('A')['C'].transform('sum')
Out[18]:
0    250
1    250
2    450
3    450
4    650
5    650
Name: C, dtype: int64

回答by Friedrich

TL;DR:

特尔;博士:

df['newColumn'] = np.where((df.compareColumn.isin(yourlist)), TrueValue, FalseValue)


Another one-step method would be to use np.where()and isin.

另一种一步法是使用np.where()isin

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
                   'B': [10, 15, 20, 25, 30,35],
                   'C': [100, 150, 200, 250, 300, 350]})

df['D'] = np.where((df.B.isin(firsts)), 1, 0)

We use the return from isinas the condition in np.where()to return either

我们使用 return fromisin作为条件 innp.where()返回

  • 1when True
  • 0when False
  • 1什么时候 True
  • 0什么时候 False

and assign them to a new column in the same dataframe df['D'].

并将它们分配给同一数据框中的新列df['D']

Note:np.whereallows more complex conditions with bitwiseoperators and replacement cases, i.e. 'bypass' on False

注意:np.where允许使用按位运算符和替换情况的更复杂的条件,即“绕过”False

df['col1'] = np.where(((df['col1'] == df['col2']) &
                       (~df['col1'].str.startswith('r'))),
                       'replace', df['col1'])