Python Pandas：带有 aggfunc = count unique distinct 的数据透视表

Question

提问by dmi

df2 = pd.DataFrame({'X' : ['X1', 'X1', 'X1', 'X1'], 'Y' : ['Y2','Y1','Y1','Y1'], 'Z' : ['Z3','Z1','Z1','Z2']})

    X   Y   Z
0  X1  Y2  Z3
1  X1  Y1  Z1
2  X1  Y1  Z1
3  X1  Y1  Z2

g=df2.groupby('X')

pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')

Traceback (most recent call last): ... AttributeError: 'Index' object has no attribute 'index'

回溯（最近一次调用）：... AttributeError: 'Index' 对象没有属性 'index'

How do I get a Pivot Table with counts of unique valuesof one DataFrame column for two other columns?
Is there aggfuncfor count unique? Should I be using np.bincount()?

如何获得一个数据透视表，其中包含其他两列的一个 DataFrame 列的唯一值计数？
是否有aggfunc用于计数独特之处？我应该使用np.bincount()吗？

NB. I am aware of 'Series' values_counts()however I need a pivot table.

注意。我知道“系列”，values_counts()但我需要一个数据透视表。

EDIT: The output should be:

编辑：输出应该是：

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Answer 1

采纳答案by Chang She

Do you mean something like this?

你的意思是这样的吗？

In [39]: df2.pivot_table(values='X', rows='Y', cols='Z', 
                         aggfunc=lambda x: len(x.unique()))
Out[39]: 
Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Note that using lenassumes you don't have NAs in your DataFrame. You can do x.value_counts().count()or len(x.dropna().unique())otherwise.

请注意，使用len假定您NA的 DataFrame 中没有s。你可以这样做x.value_counts().count()或len(x.dropna().unique())以其他方式。

Answer 2

回答by Pablo Navarro

You can construct a pivot table for each distinct value of X. In this case,

您可以为的每个不同值构建一个数据透视表X。在这种情况下，

for xval, xgroup in g:
    ptable = pd.pivot_table(xgroup, rows='Y', cols='Z', 
        margins=False, aggfunc=numpy.size)

will construct a pivot table for each value of X. You may want to index ptableusing the xvalue. With this code, I get (for X1)

将为的每个值构建一个数据透视表X。您可能希望ptable使用xvalue. 使用此代码，我得到 (for X1)

     X        
Z   Z1  Z2  Z3
Y             
Y1   2   1 NaN
Y2 NaN NaN   1

Answer 3

回答by julian peng

This is a good way of counting entries within .pivot_table:

这是计算内条目的好方法.pivot_table：

df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')


        X1  X2
Y   Z       
Y1  Z1   1   1
    Z2   1  NaN
Y2  Z3   1  NaN

Answer 4

回答by Manavalan Gajapathy

aggfunc=pd.Series.nuniqueprovides distinct count.

aggfunc=pd.Series.nunique提供不同的计数。

Credit to @hume for this solution (see comment under the accepted answer). Adding as answer here for better discoverability.

此解决方案归功于@hume（请参阅已接受答案下的评论）。在此处添加答案以提高可发现性。

Answer 5

回答by Javier

Since at least version 0.16 of pandas, it does not take the parameter "rows"

由于至少版本 0.16 的熊猫，它不带参数“行”

As of 0.23, the solution would be:

从 0.23 开始，解决方案是：

df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)

which returns:

返回：

Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

Answer 6

回答by Benoit Drogou

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:

由于最新版本的 Pandas 没有一个答案是最新的，我正在为这个问题编写另一个解决方案：

In [1]:
import pandas as pd

# Set exemple
df2 = pd.DataFrame({'X' : ['X1', 'X1', 'X1', 'X1'], 'Y' : ['Y2','Y1','Y1','Y1'], 'Z' : ['Z3','Z1','Z1','Z2']})

# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)

Out [1]:
Z   Z1  Z2  Z3
Y           
Y1  1.0 1.0 NaN
Y2  NaN NaN 1.0

Answer 7

回答by grisaitis

For best performance I recommend doing DataFrame.drop_duplicatesfollowed up aggfunc='count'.

为了获得最佳性能，我建议进行DataFrame.drop_duplicates跟进aggfunc='count'。

Others are correct that aggfunc=pd.Series.nuniquewill work. This can be slow, however, if the number of indexgroups you have is large (>1000).

其他人是正确的，aggfunc=pd.Series.nunique会起作用。但是，如果index您拥有的组数很大（> 1000），这可能会很慢。

So instead of (to quote @Javier)

所以而不是（引用@Javier）

df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)

I suggest

我建议

df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')

This works because it guarantees that every subgroup (each combination of ('Y', 'Z')) will have unique (non-duplicate) values of 'X'.

这是有效的，因为它保证每个子组（的每个组合('Y', 'Z')）将具有唯一（非重复）的值'X'。

Python Pandas：带有 aggfunc = count unique distinct 的数据透视表

提问by dmi

采纳答案by Chang She

回答by Pablo Navarro

回答by julian peng

回答by Manavalan Gajapathy

回答by Javier

回答by Benoit Drogou

回答by grisaitis

相关推荐

最近更新

标签

Python Pandas：带有 aggfunc = count unique distinct 的数据透视表

提问by dmi

采纳答案by Chang She

回答by Pablo Navarro

回答by julian peng

回答by Manavalan Gajapathy

回答by Javier

回答by Benoit Drogou

回答by grisaitis

相关推荐

Python：你究竟如何取一个字符串，将其拆分、反转并重新连接在一起？

Python 继承最佳实践：*args、**kwargs 或明确指定参数

在 Python/Django 中打印变量的值？

Python 隐藏/不可见的 Matplotlib 图形

相关推荐

最近更新

标签