一列中的python pandas pivot_table计数频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22412033/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas pivot_table count frequency in one column
提问by midtownguru
I am still new to Python pandas' pivot_table and would like to ask a way to count frequencies of values in one column, which is also linked to another column of ID. The DataFrame looks like the following.
我仍然是 Python pandas 的 pivot_table 的新手,并且想询问一种方法来计算一列中值的频率,该列也与另一列 ID 相关联。DataFrame 如下所示。
import pandas as pd
df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3],
'Product':['A', 'A', 'A', 'B', 'B','A', 'B']
})
For the output, I'd like to get something like the following:
对于输出,我想得到如下内容:
Product
A B
Account_number
1 2 0
2 1 2
3 1 1
So far, I tried this code:
到目前为止,我试过这个代码:
df.pivot_table(rows = 'Account_number', cols= 'Product', aggfunc='count')
This code gives me the two same things. What is the problems with the code above? A part of the reason why I am asking this question is that this DataFrame is just an example. The real data that I am working on has tens of thousands of account_numbers. Thanks for your help in advance!
这段代码给了我两个相同的东西。上面的代码有什么问题?我问这个问题的部分原因是这个 DataFrame 只是一个例子。我正在处理的真实数据有数万个 account_numbers。提前感谢您的帮助!
采纳答案by Andy Hayden
You need to specify the aggfuncas len:
您需要指定aggfunc为len:
In [11]: df.pivot_table(index='Account_number', columns='Product',
aggfunc=len, fill_value=0)
Out[11]:
Product A B
Account_number
1 2 0
2 1 2
3 1 1
It looks like count, is counting the instances of each column (Account_numberand Product), it's not clear to me whether this is a bug...
看起来像计数,正在计算每列(Account_number和Product)的实例,我不清楚这是否是一个错误......
回答by PagMax
In new version of Pandas, slight modification is required. I had to spend some time figuring out so just wanted to add that here so that someone can directly use this.
在新版本的 Pandas 中,需要稍作修改。我不得不花一些时间弄清楚所以只想在这里添加它,以便有人可以直接使用它。
df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
fill_value=0)
回答by Rui Wang
You can use countdf.pivot_table(index='Account_number', columns='Product', aggfunc='count')
您可以使用 countdf.pivot_table(index='Account_number', columns='Product', aggfunc='count')
回答by Ted Petrou
Solution:Use aggfunc='size'
解决方法:使用aggfunc='size'
Using aggfunc=lenor aggfunc='count'like all the other answers on this page will not work for DataFrames with more than three columns. By default, pandas will apply this aggfuncto all the columns not found in indexor columnsparameters.
使用aggfunc=len或aggfunc='count'喜欢此页面上的所有其他答案将不适用于具有三列以上的 DataFrame。默认情况下,pandas 会将其应用于aggfunc所有在index或columns参数中未找到的列。
For instance, if we had two more columns in our original DataFrame defined like this:
例如,如果我们在原始 DataFrame 中还有两列这样定义:
df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3],
'Product':['A', 'A', 'A', 'B', 'B','A', 'B'],
'Price': [10] * 7,
'Quantity': [100] * 7})
Output:
输出:
Account_number Product Price Quantity
0 1 A 10 100
1 1 A 10 100
2 2 A 10 100
3 2 B 10 100
4 2 B 10 100
5 3 A 10 100
6 3 B 10 100
If you apply the current solutions to this DataFrame, you would get the following:
如果您将当前解决方案应用于此 DataFrame,您将获得以下结果:
df.pivot_table(index='Account_number',
columns='Product',
aggfunc=len,
fill_value=0)
Output:
输出:
Price Quantity
Product A B A B
Account_number
1 2 0 2 0
2 1 2 1 2
3 1 1 1 1
Solution
解决方案
Instead, use aggfunc='size'. Since sizealways returns the same number for each column, pandas does not call it on every single column and just does it once.
相反,使用aggfunc='size'. 由于size总是为每一列返回相同的数字,pandas 不会在每一列上调用它,只会调用一次。
df.pivot_table(index='Account_number',
columns='Product',
aggfunc='size',
fill_value=0)
Output:
输出:
Product A B
Account_number
1 2 0
2 1 2
3 1 1

