pandas 如何在python上过滤数据透视表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37366208/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:16:09  来源:igfitidea点击:

How to filter pivot tables on python

pythonpandasdataframepivot-table

提问by Kyuu

How do I filter pivot tables to return specific columns. Currently my dataframe is this:

如何过滤数据透视表以返回特定列。目前我的数据框是这样的:

print table
                    sum            
Sex              Female  Male   All
Date (Intervals)                   
April               166   191   357
August              212   263   475
December            173   263   436
February            192   298   490
January             148   195   343
July                189   260   449
June                165   238   403
March               165   278   443
May                 236   253   489
November            167   247   414
October             185   287   472
September           175   306   481
All                2173  3079  5252

I want to display results of only the male column. I tried the following code:

我只想显示男性列的结果。我尝试了以下代码:

table.query('Sex == "Male"')

However I got this error

但是我收到了这个错误

TypeError: Expected tuple, got str

How would I be able to filter my table with specified rows or columns.

我如何能够用指定的行或列过滤我的表格。

回答by unutbu

It looks like tablehas a column MultiIndex:

看起来table有一列MultiIndex

                    sum            
Sex              Female  Male   All

One way to check if your tablehas a column MultiIndex is to inspect table.columns:

检查您table是否有一列 MultiIndex 的一种方法是检查table.columns

In [178]: table.columns
Out[178]: 
MultiIndex(levels=[['sum'], ['All', 'Female', 'Male']],
           labels=[[0, 0, 0], [1, 2, 0]],
           names=[None, 'sex'])

To access a column of tableyou need to specify a value for each level of the MultiIndex:

要访问table您需要为 MultiIndex 的每个级别指定一个值的列:

In [179]: list(table.columns)
Out[179]: [('sum', 'Female'), ('sum', 'Male'), ('sum', 'All')]

Thus, to select the Malecolumn, you would use

因此,要选择Male列,您将使用

In [176]: table[('sum', 'Male')]
Out[176]: 
date
April         42.0
August        34.0
December      32.0
...


Since the sumlevel is unnecessary, you could get rid of it by specifying the valuesparameter when calling df.pivotor df.pivot_table.

由于该sum级别是不必要的,您可以通过values在调用df.pivotor时指定参数来摆脱它df.pivot_table

table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
                        values='sum')
# sex        Female   Male     All
# date                            
# April        40.0   40.0    80.0
# August       48.0   32.0    80.0
# December     48.0   44.0    92.0

For example,

例如,

import numpy as np
import pandas as pd
import calendar
np.random.seed(2016)
N = 1000
sex = np.random.choice(['Male', 'Female'], size=N)
date = np.random.choice(calendar.month_name[1:13], size=N)
df = pd.DataFrame({'sex':sex, 'date':date, 'sum':1})

# This reproduces a table similar to yours
table = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True)
print(table[('sum', 'Male')])

# table2 has a single level Index
table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
                        values='sum')
print(table2['Male'])


Another way to remove the sumlevel would be to use table = table['sum'], or table.columns = table.columns.droplevel(0).

删除sum级别的另一种方法是使用table = table['sum'], 或table.columns = table.columns.droplevel(0)