pandas 如何使用熊猫计算数据框中每个列表中的元素？

Question

提问by

Given such a data frame df:

给定这样的数据框df：

0     1
1     [12]
1     [13]
2     [11,12]
1     [10,0,1]
....

I'd like to count a certain value, for instance, '12'in each list of df. So i tried:

我想计算某个值，例如，'12'在每个df. 所以我试过：

df.apply(list.count('12'))

but got error: TypeError: descriptor 'count' requires a 'list' object but received a 'str'. But they are exactly listsin df[1]! How can I correct it? Thanks!

但得到错误：TypeError: descriptor 'count' requires a 'list' object but received a 'str'。但他们正好lists在df[1]！我该如何纠正？谢谢！

Answer 1

采纳答案by jezrael

I think you can try first select column as Series by ixand then applyfunction x.count(12):

我认为您可以尝试先选择列作为系列ix，然后apply运行x.count(12)：

import pandas as pd

d = { 0:pd.Series([1,1,2,1]),
      1:pd.Series([[12], [13], [11,12 ],[10,0,1]])}

df = pd.DataFrame(d)  

print df 
   0           1
0  1        [12]
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1]
0          [12]
1          [13]
2      [11, 12]
3    [10, 0, 1]
Name: 1, dtype: object

print df.ix[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

Or use ilocfor selecting:

或iloc用于选择：

print df.iloc[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT:

编辑：

I think column 1contains NaN.

我认为列1包含NaN.

You can use:

您可以使用：

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1].notnull()
0    False
1     True
2     True
3     True
Name: 1, dtype: bool

print df.ix[df.ix[:, 1].notnull(), 1].apply(lambda x: x.count(12))   
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT2:

编辑2：

If you want filter by index (e.g. 0:2) and by NaN in column 1:

如果您想按索引（例如0:2）和列中的 NaN过滤1：

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

#filter df by index - only 0 to 2 
print df.ix[0:2, 1]
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

#boolean series, where is not nul filtered df
print df.ix[0:2, 1].notnull()
0    False
1     True
2     True
Name: 1, dtype: bool

#get column 1: first is filtered to 0:2 index and then if is not null
print df.ix[0:2, 1][df.ix[0:2, 1].notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#same as above, but more nice
df1 =  df.ix[0:2, 1]
print df1
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

print df1[df1.notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#apply count
print df1[df1.notnull()].apply(lambda x: x.count(12))   
1    0
2    1
Name: 1, dtype: int64

Answer 2

回答by Romain

The counthas to be applied on the column.

在count对列施加。

# Test data
df = pd.DataFrame({1: [[1], [12], [13], [11,12], [10,0,1]]})

df[1].apply(lambda x: x.count(12))

0    0
1    1
2    0
3    1
4    0
Name: 1, dtype: int64

A modification to handle the case when some values are not stored in a list

处理某些值未存储在列表中的情况的修改

# An example with values not stored in list 
df = pd.DataFrame({1: [12, [12], [13], [11,12], [10,0,1], 1]})

_check = 12
df[1].apply(lambda l: l.count(_check) if (type(l) is list) else int(l == _check))

0    1
1    1
2    0
3    1
4    0
5    0
Name: 1, dtype: int64

Answer 3

回答by Alexander

You can use a conditional generator expression:

您可以使用条件生成器表达式：

df = df = pd.DataFrame({0: [1, 1, 2, 1, 1, 2], 1: [np.nan, [13], [11, 12], [10, 0, 1], [12], [np.nan, 12]]})

target = 12
>>> sum(sub_list.count(target) 
        for sub_list in df.iloc[:, 1] 
        if not np.isnan(sub_list).all())
3

This is like the following conditional list comprehension:

这类似于以下条件列表推导式：

>>> [sub_list.count(12) for sub_list in df.iloc[:, 1] if not np.isnan(sub_list).all()]
[0, 1, 0, 1, 1]

The difference is that the former lazily evaluates each item in the list instead of first generating the entire list, so it is generally more efficient.

区别在于前者懒惰地评估列表中的每个项目，而不是首先生成整个列表，因此通常效率更高。

pandas 如何使用熊猫计算数据框中每个列表中的元素？

提问by

采纳答案by jezrael

回答by Romain

回答by Alexander

相关推荐

最近更新

标签

pandas 如何使用熊猫计算数据框中每个列表中的元素？

提问by

采纳答案by jezrael

回答by Romain

回答by Alexander

相关推荐

pandas 熊猫日期列减法

pandas 如何计算具有条件的连续熊猫数据帧行之间的天差

pandas 如何创建具有重复字符串值的数据框列？

使用 Pandas 数据框绘制误差线 matplotlib

相关推荐

最近更新

标签