pandas 如何在python上过滤数据透视表

Question

提问by Kyuu

How do I filter pivot tables to return specific columns. Currently my dataframe is this:

如何过滤数据透视表以返回特定列。目前我的数据框是这样的：

print table
                    sum            
Sex              Female  Male   All
Date (Intervals)                   
April               166   191   357
August              212   263   475
December            173   263   436
February            192   298   490
January             148   195   343
July                189   260   449
June                165   238   403
March               165   278   443
May                 236   253   489
November            167   247   414
October             185   287   472
September           175   306   481
All                2173  3079  5252

I want to display results of only the male column. I tried the following code:

我只想显示男性列的结果。我尝试了以下代码：

table.query('Sex == "Male"')

However I got this error

但是我收到了这个错误

TypeError: Expected tuple, got str

How would I be able to filter my table with specified rows or columns.

我如何能够用指定的行或列过滤我的表格。

Answer 1

回答by unutbu

It looks like tablehas a column MultiIndex:

看起来table有一列MultiIndex：

                    sum            
Sex              Female  Male   All

One way to check if your tablehas a column MultiIndex is to inspect table.columns:

检查您table是否有一列 MultiIndex 的一种方法是检查table.columns：

In [178]: table.columns
Out[178]: 
MultiIndex(levels=[['sum'], ['All', 'Female', 'Male']],
           labels=[[0, 0, 0], [1, 2, 0]],
           names=[None, 'sex'])

To access a column of tableyou need to specify a value for each level of the MultiIndex:

要访问table您需要为 MultiIndex 的每个级别指定一个值的列：

In [179]: list(table.columns)
Out[179]: [('sum', 'Female'), ('sum', 'Male'), ('sum', 'All')]

Thus, to select the Malecolumn, you would use

因此，要选择Male列，您将使用

In [176]: table[('sum', 'Male')]
Out[176]: 
date
April         42.0
August        34.0
December      32.0
...

Since the sumlevel is unnecessary, you could get rid of it by specifying the valuesparameter when calling df.pivotor df.pivot_table.

由于该sum级别是不必要的，您可以通过values在调用df.pivotor时指定参数来摆脱它df.pivot_table。

table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
                        values='sum')
# sex        Female   Male     All
# date                            
# April        40.0   40.0    80.0
# August       48.0   32.0    80.0
# December     48.0   44.0    92.0

For example,

例如，

import numpy as np
import pandas as pd
import calendar
np.random.seed(2016)
N = 1000
sex = np.random.choice(['Male', 'Female'], size=N)
date = np.random.choice(calendar.month_name[1:13], size=N)
df = pd.DataFrame({'sex':sex, 'date':date, 'sum':1})

# This reproduces a table similar to yours
table = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True)
print(table[('sum', 'Male')])

# table2 has a single level Index
table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
                        values='sum')
print(table2['Male'])

Another way to remove the sumlevel would be to use table = table['sum'], or table.columns = table.columns.droplevel(0).

删除sum级别的另一种方法是使用table = table['sum'], 或table.columns = table.columns.droplevel(0)。

pandas 如何在python上过滤数据透视表

提问by Kyuu

回答by unutbu

相关推荐

最近更新

标签

pandas 如何在python上过滤数据透视表

提问by Kyuu

回答by unutbu

相关推荐

使用枢轴的 Pandas KeyError

# pandas DataFrame ValueError: 传递值的形状是 (1, 3)，索引意味着 (3, 3)

pandas 如何在熊猫的字符串列中按 NaN 过滤？

pandas 如何删除数据框中的回车

相关推荐

最近更新

标签