pandas group by with mode as aggregator

Question

提问by Josh

I've got a set of survey responses that I'm trying to analyze with pandas. My goal is to find (for this example) the most common gender in each county in the US, so I use the following code:

我有一组我正在尝试用Pandas分析的调查回复。我的目标是找到（对于这个例子）美国每个县最常见的性别，所以我使用以下代码：

import pandas as pd
from scipy import stats
file['sex'].groupby(file['county']).agg([('modeSex', stats.mode)])

The output is:

输出是：

How can I unpack this to only get the mode value and not the second value that tells how often the mode occurs?

我怎样才能解压它来只获得模式值而不是告诉模式发生频率的第二个值？

Here is a sample of the data frame:

这是数据框的示例：

county|sex
----------
079   | 1
----------
079   | 2
----------
079   | 2
----------
075   | 1
----------
075   | 1
----------
075   | 1
----------
075   | 2

Desired output is:

期望的输出是：

county|modeSex
----------
079   | 2
----------
075   | 1

Answer 1

采纳答案by ayhan

Pandas is complaining about the returning array (I guess a pandas cellcannot hold a numpy array) when you use stats.mode(x)[0] so you can convert it to a list or a tuple:

当您使用 stats.mode(x)[0] 时，Pandas 抱怨返回数组（我猜一个 Pandas单元不能容纳一个 numpy 数组），因此您可以将其转换为列表或元组：

df = pd.DataFrame({"C1": np.random.randint(10, size=100), "C2": np.random.choice(["X", "Y", "Z"], size=100)})
print(df.groupby(['C2']).agg(lambda x: tuple(stats.mode(x)[0])))

Out:

出去：

     C1
C2      
X   (0,)
Y   (4,)
Z   (3,)

Since there can be multiple modes, if you want to keep all of them you'll need tuples or lists. If you want the first mode, you can extract that:

由于可以有多种模式，如果您想保留所有模式，则需要元组或列表。如果你想要第一种模式，你可以提取：

df.groupby(['C2']).agg(lambda x: stats.mode(x)[0][0])

Out:

    C1
C2    
X    0
Y    4
Z    3

Answer 2

回答by sid

scipy.stats.mode returns array of modal values, array of counts for each modeso we can use stats.mode(a)[0]to return only first value

scipy.stats.mode 返回array of modal values, array of counts for each mode所以我们可以使用stats.mode(a)[0]只返回第一个值

here is the code

这是代码

import pandas as pd
from scipy import stats
# sample data frame
df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})
# use lambda functions
print df2.groupby(['X']).agg({'Y': lambda x:stats.mode(x)[0]})

output:

输出：

pandas group by with mode as aggregator

提问by Josh

采纳答案by ayhan

回答by sid

相关推荐

最近更新

标签

pandas group by with mode as aggregator

提问by Josh

采纳答案by ayhan

回答by sid

相关推荐

如何从 Pandas 的数据框中满足条件的位置获取前一行

从 pandas 列中删除非 ASCII 字符

pandas 如何更新数据帧值

使用 Pandas 导出到 csv 时如何指定数据类型和格式？

相关推荐

最近更新

标签