一列中的python pandas pivot_table计数频率

Question

提问by midtownguru

I am still new to Python pandas' pivot_table and would like to ask a way to count frequencies of values in one column, which is also linked to another column of ID. The DataFrame looks like the following.

我仍然是 Python pandas 的 pivot_table 的新手，并且想询问一种方法来计算一列中值的频率，该列也与另一列 ID 相关联。DataFrame 如下所示。

import pandas as pd
df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3],
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B']
                  })

For the output, I'd like to get something like the following:

对于输出，我想得到如下内容：

                Product
                A      B
Account_number           
      1         2      0
      2         1      2
      3         1      1

So far, I tried this code:

到目前为止，我试过这个代码：

df.pivot_table(rows = 'Account_number', cols= 'Product', aggfunc='count')

This code gives me the two same things. What is the problems with the code above? A part of the reason why I am asking this question is that this DataFrame is just an example. The real data that I am working on has tens of thousands of account_numbers. Thanks for your help in advance!

这段代码给了我两个相同的东西。上面的代码有什么问题？我问这个问题的部分原因是这个 DataFrame 只是一个例子。我正在处理的真实数据有数万个 account_numbers。提前感谢您的帮助！

Answer 1

采纳答案by Andy Hayden

You need to specify the aggfuncas len:

您需要指定aggfunc为len：

In [11]: df.pivot_table(index='Account_number', columns='Product', 
                        aggfunc=len, fill_value=0)
Out[11]:
Product         A  B
Account_number
1               2  0
2               1  2
3               1  1

It looks like count, is counting the instances of each column (Account_numberand Product), it's not clear to me whether this is a bug...

看起来像计数，正在计算每列（Account_number和Product）的实例，我不清楚这是否是一个错误......

Answer 2

回答by PagMax

In new version of Pandas, slight modification is required. I had to spend some time figuring out so just wanted to add that here so that someone can directly use this.

在新版本的 Pandas 中，需要稍作修改。我不得不花一些时间弄清楚所以只想在这里添加它，以便有人可以直接使用它。

df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
               fill_value=0)

Answer 3

回答by Rui Wang

You can use countdf.pivot_table(index='Account_number', columns='Product', aggfunc='count')

您可以使用 countdf.pivot_table(index='Account_number', columns='Product', aggfunc='count')

Answer 4

回答by Ted Petrou

Solution:Use aggfunc='size'

解决方法：使用aggfunc='size'

Using aggfunc=lenor aggfunc='count'like all the other answers on this page will not work for DataFrames with more than three columns. By default, pandas will apply this aggfuncto all the columns not found in indexor columnsparameters.

使用aggfunc=len或aggfunc='count'喜欢此页面上的所有其他答案将不适用于具有三列以上的 DataFrame。默认情况下，pandas 会将其应用于aggfunc所有在index或columns参数中未找到的列。

For instance, if we had two more columns in our original DataFrame defined like this:

例如，如果我们在原始 DataFrame 中还有两列这样定义：

df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3], 
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B'], 
                   'Price': [10] * 7,
                   'Quantity': [100] * 7})

Output:

输出：

   Account_number Product  Price  Quantity
0               1       A     10       100
1               1       A     10       100
2               2       A     10       100
3               2       B     10       100
4               2       B     10       100
5               3       A     10       100
6               3       B     10       100

If you apply the current solutions to this DataFrame, you would get the following:

如果您将当前解决方案应用于此 DataFrame，您将获得以下结果：

df.pivot_table(index='Account_number',
               columns='Product',
               aggfunc=len,
               fill_value=0)

Output:

输出：

                  Price    Quantity   
Product            A  B        A  B
Account_number                     
1                  2  0        2  0
2                  1  2        1  2
3                  1  1        1  1

Solution

解决方案

Instead, use aggfunc='size'. Since sizealways returns the same number for each column, pandas does not call it on every single column and just does it once.

相反，使用aggfunc='size'. 由于size总是为每一列返回相同的数字，pandas 不会在每一列上调用它，只会调用一次。

df.pivot_table(index='Account_number', 
               columns='Product',
               aggfunc='size',
               fill_value=0)

Output:

输出：

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1

一列中的python pandas pivot_table计数频率

提问by midtownguru

采纳答案by Andy Hayden

回答by PagMax

回答by Rui Wang

回答by Ted Petrou

Solution

解决方案

相关推荐

最近更新

标签

一列中的python pandas pivot_table计数频率

提问by midtownguru

采纳答案by Andy Hayden

回答by PagMax

回答by Rui Wang

回答by Ted Petrou

Solution

解决方案

相关推荐

Python 浅拷贝、深拷贝和普通赋值操作有什么区别？

Python Matplotlib：ValueError：x 和 y 必须具有相同的第一维

Python 如何查找特定 <ul> 类中的所有 <li>？

如何使用 OpenCV (Python) 捕获视频流

相关推荐

最近更新

标签