pandas 数据框列值与列表的比较

Question

提问by Mehrdad Salimi

Consider this Dataframe:

考虑这个数据框：

df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
 'B': [10, 15, 20, 25, 30,35],
 'C': [100, 150, 200, 250, 300, 350]},)


A   B   C
1   10  100
1   15  150
2   20  200
2   25  250
3   30  300
3   35  350

I used this to get values of column C for first rows of each group:

我用它来获取每组第一行的 C 列值：

firsts = df.groupby('A').first()['C']

So first will be: (100, 200, 300).

所以首先将是：(100, 200, 300)。

Now I want to add new column which it will be '1' if value of column C for row is in firstsotherwise it will be '0'.

现在我想添加新列，如果行的 C 列的值在，它将是 '1'，firsts否则它将是 '0'。

   A    B   C   D
   1    10  100 1
   1    15  150 0
   2    20  200 1
   2    25  250 0
   3    30  300 1
   3    35  350 0

I used this:

我用过这个：

df['D'] = df['C'].apply(lambda x: 1 if x in firsts else 0)

But the output is:

但输出是：

   A    B   C   D
   1    10  100 0
   1    15  150 0
   2    20  200 0
   2    25  250 0
   3    30  300 0
   3    35  350 0

I appreciate if anyone explain why my solution is wrong and what is actual solution to this problem.

如果有人解释为什么我的解决方案是错误的以及这个问题的实际解决方案是什么，我很感激。

Answer 1

回答by Psidom

You can use isinmethod:

您可以使用isin方法：

df['D'] = df.C.isin(firsts).astype(int)

df
#   A   B   C   D
#0  1   10  100 1
#1  1   15  150 0
#2  2   20  200 1
#3  2   25  250 0
#4  3   30  300 1
#5  3   35  350 0

The reason your approach fails is that python inoperator check the index of a Series instead of the values, the same as how a dictionary works:

您的方法失败的原因是 pythonin运算符检查系列的索引而不是值，与字典的工作方式相同：

firsts
#A
#1    100
#2    200
#3    300
#Name: C, dtype: int64

1 in firsts
# True

100 in firsts
# False

2 in firsts
# True

200 in firsts
# False

Modifying your method as follows works:

修改您的方法如下有效：

firstSet = set(firsts)
df['C'].apply(lambda x: 1 if x in firstSet else 0)

#0    1
#1    0
#2    1
#3    0
#4    1
#5    0
#Name: C, dtype: int64

Answer 2

回答by MaxU

You can also do it in one step using .transform('first'):

您也可以使用.transform('first')以下步骤一步完成：

In [280]: df['D'] = df.groupby('A')['C'].transform('first').eq(df['C']).astype(np.int8)

In [281]: df
Out[281]:
   A   B    C  D
0  1  10  100  1
1  1  15  150  0
2  2  20  200  1
3  2  25  250  0
4  3  30  300  1
5  3  35  350  0

Explanation:GroupBy.transform('func')returns us a vector with the same length as the original DF with applied func

说明：向GroupBy.transform('func')我们返回一个与应用了原始 DF 的长度相同的向量func

In [14]: df.groupby('A')['C'].transform('first')
Out[14]:
0    100
1    100
2    200
3    200
4    300
5    300
Name: C, dtype: int64

In [15]: df.groupby('A')['C'].transform('max')
Out[15]:
0    150
1    150
2    250
3    250
4    350
5    350
Name: C, dtype: int64

In [16]: df.groupby('A')['C'].transform('min')
Out[16]:
0    100
1    100
2    200
3    200
4    300
5    300
Name: C, dtype: int64

In [17]: df.groupby('A')['C'].transform('mean')
Out[17]:
0    125
1    125
2    225
3    225
4    325
5    325
Name: C, dtype: int64

In [18]: df.groupby('A')['C'].transform('sum')
Out[18]:
0    250
1    250
2    450
3    450
4    650
5    650
Name: C, dtype: int64

Answer 3

回答by Friedrich

TL;DR:

特尔；博士：

df['newColumn'] = np.where((df.compareColumn.isin(yourlist)), TrueValue, FalseValue)

Another one-step method would be to use np.where()and isin.

另一种一步法是使用np.where()和isin。

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
                   'B': [10, 15, 20, 25, 30,35],
                   'C': [100, 150, 200, 250, 300, 350]})

df['D'] = np.where((df.B.isin(firsts)), 1, 0)

We use the return from isinas the condition in np.where()to return either

我们使用 return fromisin作为条件 innp.where()返回

1when True
0when False

1什么时候 True
0什么时候 False

and assign them to a new column in the same dataframe df['D'].

并将它们分配给同一数据框中的新列df['D']。

Note:np.whereallows more complex conditions with bitwiseoperators and replacement cases, i.e. 'bypass' on False

注意：np.where允许使用按位运算符和替换情况的更复杂的条件，即“绕过”False

df['col1'] = np.where(((df['col1'] == df['col2']) &
                       (~df['col1'].str.startswith('r'))),
                       'replace', df['col1'])

pandas 数据框列值与列表的比较

提问by Mehrdad Salimi

回答by Psidom

回答by MaxU

回答by Friedrich

相关推荐

最近更新

标签

pandas 数据框列值与列表的比较

提问by Mehrdad Salimi

回答by Psidom

回答by MaxU

回答by Friedrich

相关推荐

Pandas 哈希表密钥错误

Pandas：加入带有条件的数据框

Pandas IndexSlice 是如何工作的

pandas 在 matplotlib 中对 x 轴进行排序

相关推荐

最近更新

标签