零值的 Pandas groupby
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37003100/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby for zero values
提问by NinjaGaiden
I have data like this in a csv file
我在 csv 文件中有这样的数据
Symbol,Action,Year
AAPL,Buy,2001
AAPL,Buy,2001
BAC,Sell,2002
BAC,Sell,2002
I am able to read it and groupby like this
我能够像这样阅读它和 groupby
df.groupby(['Symbol','Year']).count()
I get
我得到
Action
Symbol Year
AAPL 2001 2
BAC 2002 2
I desire this (order does not matter)
我想要这个(顺序无关紧要)
Action
Symbol Year
AAPL 2001 2
AAPL 2002 0
BAC 2001 0
BAC 2002 2
I want to know if its possible to count for zero occurances
我想知道是否可以计算零发生
采纳答案by jezrael
You can use pivot_table
with unstack
:
你可以用pivot_table
与unstack
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
dtype: int64
If you need output as DataFrame
use to_frame
:
如果您需要输出作为DataFrame
使用to_frame
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
.to_frame()
.rename(columns={0:'Action'})
Action
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
回答by Joe
You can use this:
你可以使用这个:
df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print df
Output:
输出:
Action
Symbol Year
AAPL 2001 2
2002 0
BAC 2001 0
2002 2
回答by Punit S
If you want to do this without using pivot_table, you can try the below approach:
如果您想在不使用 pivot_table 的情况下执行此操作,您可以尝试以下方法:
midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year'])
df_grouped_by = df_grouped_by.reindex(midx, fill_value=0)
What we are essentially doing above is creating a multi-index of all the possible values multiplying the two columns and then using that multi-index to fill zeroes into our group-by dataframe.
我们在上面所做的基本上是创建一个所有可能值的多索引乘以两列,然后使用该多索引将零填充到我们的分组数据帧中。
回答by Anjul Tyagi
Step 1: Create a dataframe that stores the count of each non-zero class in the column counts
步骤 1:创建一个数据框,用于存储列计数中每个非零类的计数
count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts')
Step 2: Now use pivot_table to get the desired dataframe with counts for both existing and non-existing classes.
第 2 步:现在使用 pivot_table 获取所需的数据框,其中包含现有和不存在类的计数。
df_final = pd.pivot_table(count_df,
index=['Symbol','Year'],
values='counts',
fill_value = 0,
dropna=False,
aggfunc=np.sum)
Now the values of the counts can be extracted as a list with the command
现在可以使用命令将计数值提取为列表
list(df_final['counts'])