pandas 熊猫数据框中列表中的元素计数

Question

提问by Gaurav Taneja

I need to get the frequency of each element in a list when the list is in a pandas data frame columns

当列表位于Pandas数据框列中时，我需要获取列表中每个元素的频率

In data:

在数据中：

din=pd.DataFrame({'x':[['a','b','c'],['a','e','d', 'c']]})`

              x
0     [a, b, c]
1  [a, e, d, c]

Desired Output:

期望输出：

I can expand the list into rows and then perform a group by but this data could be large ( million plus records ) and was wondering if there is a more efficient/direct way.

我可以将列表扩展为行，然后执行分组，但此数据可能很大（数百万条记录），并且想知道是否有更有效/直接的方法。

Thanks

谢谢

Answer 1

回答by jezrael

First flattenvalues of lists and then count by value_countsor sizeor Counter:

首先展平lists 的值，然后按value_countsorsize或 or计数Counter：

a = pd.Series([item for sublist in din.x for item in sublist])

Or:

或者：

a = pd.Series(np.concatenate(din.x))

df = a.value_counts().sort_index().rename_axis('x').reset_index(name='f')

Or:

或者：

df = a.groupby(a).size().rename_axis('x').reset_index(name='f')

from collections import Counter
from  itertools import chain

df = pd.Series(Counter(chain(*din.x))).sort_index().rename_axis('x').reset_index(name='f')

print (df)
   x  f
0  a  2
1  b  1
2  c  2
3  d  1
4  e  1

Answer 2

回答by tmsss

You can also have an one liner like this:

你也可以有这样的单衬：

df = pd.Series(sum([item for item in din.x], [])).value_counts()

pandas 熊猫数据框中列表中的元素计数

提问by Gaurav Taneja

回答by jezrael

回答by tmsss

相关推荐

最近更新

标签

pandas 熊猫数据框中列表中的元素计数

提问by Gaurav Taneja

回答by jezrael

回答by tmsss

相关推荐

具有冗余 nan 类别的 Pandas groupby

pandas ValueError：DataFrame 的真值不明确

pandas.Panel 弃用警告实际上推荐什么？

pandas 熊猫在数据框中有条件地选择特定列，另一个条件会导致串联

相关推荐

最近更新

标签