零值的 Pandas groupby

Question

提问by NinjaGaiden

I have data like this in a csv file

我在 csv 文件中有这样的数据

Symbol,Action,Year
AAPL,Buy,2001
AAPL,Buy,2001
BAC,Sell,2002
BAC,Sell,2002

I am able to read it and groupby like this

我能够像这样阅读它和 groupby

df.groupby(['Symbol','Year']).count()

I get

我得到

             Action
Symbol Year        
AAPL   2001       2
BAC    2002       2

I desire this (order does not matter)

我想要这个（顺序无关紧要）

             Action
Symbol Year        
AAPL   2001       2
AAPL   2002       0
BAC    2001       0
BAC    2002       2

I want to know if its possible to count for zero occurances

我想知道是否可以计算零发生

Answer 1

采纳答案by jezrael

You can use pivot_tablewith unstack:

你可以用pivot_table与unstack：

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()

Year  Symbol
2001  AAPL      2
      BAC       0
2002  AAPL      0
      BAC       2
dtype: int64

If you need output as DataFrameuse to_frame:

如果您需要输出作为DataFrame使用to_frame：

print df.pivot_table(index='Symbol', 
                     columns='Year', 
                     values='Action',
                     fill_value=0, 
                     aggfunc='count').unstack()
                                     .to_frame()
                                     .rename(columns={0:'Action'})

             Action
Year Symbol        
2001 AAPL         2
     BAC          0
2002 AAPL         0
     BAC          2

Answer 2

回答by Joe

You can use this:

你可以使用这个：

df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print df

Output:

输出：

             Action
Symbol Year        
AAPL   2001       2
       2002       0
BAC    2001       0
       2002       2

Answer 3

回答by Punit S

If you want to do this without using pivot_table, you can try the below approach:

如果您想在不使用 pivot_table 的情况下执行此操作，您可以尝试以下方法：

midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year'])
df_grouped_by = df_grouped_by.reindex(midx, fill_value=0)

What we are essentially doing above is creating a multi-index of all the possible values multiplying the two columns and then using that multi-index to fill zeroes into our group-by dataframe.

我们在上面所做的基本上是创建一个所有可能值的多索引乘以两列，然后使用该多索引将零填充到我们的分组数据帧中。

Answer 4

回答by Anjul Tyagi

Step 1: Create a dataframe that stores the count of each non-zero class in the column counts

步骤 1：创建一个数据框，用于存储列计数中每个非零类的计数

count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts')

Step 2: Now use pivot_table to get the desired dataframe with counts for both existing and non-existing classes.

第 2 步：现在使用 pivot_table 获取所需的数据框，其中包含现有和不存在类的计数。

df_final = pd.pivot_table(count_df,
                       index=['Symbol','Year'],
                       values='counts',                            
                       fill_value = 0,
                       dropna=False,
                       aggfunc=np.sum)

Now the values of the counts can be extracted as a list with the command

现在可以使用命令将计数值提取为列表

list(df_final['counts'])

零值的 Pandas groupby

提问by NinjaGaiden

采纳答案by jezrael

回答by Joe

回答by Punit S

回答by Anjul Tyagi

相关推荐

最近更新

标签

零值的 Pandas groupby

提问by NinjaGaiden

采纳答案by jezrael

回答by Joe

回答by Punit S

回答by Anjul Tyagi

相关推荐

pandas 根据特定列或列中是否存在空值从 DataFrame 中选择行

pandas 在数据框的每一列中查找数据类型

Pandas read_csv，读取缺少标题元素的csv文件

Pandas - 按连续范围分组

相关推荐

最近更新

标签