pandas 如何在python上过滤数据透视表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37366208/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to filter pivot tables on python
提问by Kyuu
How do I filter pivot tables to return specific columns. Currently my dataframe is this:
如何过滤数据透视表以返回特定列。目前我的数据框是这样的:
print table
sum
Sex Female Male All
Date (Intervals)
April 166 191 357
August 212 263 475
December 173 263 436
February 192 298 490
January 148 195 343
July 189 260 449
June 165 238 403
March 165 278 443
May 236 253 489
November 167 247 414
October 185 287 472
September 175 306 481
All 2173 3079 5252
I want to display results of only the male column. I tried the following code:
我只想显示男性列的结果。我尝试了以下代码:
table.query('Sex == "Male"')
However I got this error
但是我收到了这个错误
TypeError: Expected tuple, got str
How would I be able to filter my table with specified rows or columns.
我如何能够用指定的行或列过滤我的表格。
回答by unutbu
It looks like table
has a column MultiIndex:
看起来table
有一列MultiIndex:
sum
Sex Female Male All
One way to check if your table
has a column MultiIndex is to inspect table.columns
:
检查您table
是否有一列 MultiIndex 的一种方法是检查table.columns
:
In [178]: table.columns
Out[178]:
MultiIndex(levels=[['sum'], ['All', 'Female', 'Male']],
labels=[[0, 0, 0], [1, 2, 0]],
names=[None, 'sex'])
To access a column of table
you need to specify a value for each level of the MultiIndex:
要访问table
您需要为 MultiIndex 的每个级别指定一个值的列:
In [179]: list(table.columns)
Out[179]: [('sum', 'Female'), ('sum', 'Male'), ('sum', 'All')]
Thus, to select the Male
column, you would use
因此,要选择Male
列,您将使用
In [176]: table[('sum', 'Male')]
Out[176]:
date
April 42.0
August 34.0
December 32.0
...
Since the sum
level is unnecessary, you could get rid of it by specifying the values
parameter when calling df.pivot
or df.pivot_table
.
由于该sum
级别是不必要的,您可以通过values
在调用df.pivot
or时指定参数来摆脱它df.pivot_table
。
table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
values='sum')
# sex Female Male All
# date
# April 40.0 40.0 80.0
# August 48.0 32.0 80.0
# December 48.0 44.0 92.0
For example,
例如,
import numpy as np
import pandas as pd
import calendar
np.random.seed(2016)
N = 1000
sex = np.random.choice(['Male', 'Female'], size=N)
date = np.random.choice(calendar.month_name[1:13], size=N)
df = pd.DataFrame({'sex':sex, 'date':date, 'sum':1})
# This reproduces a table similar to yours
table = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True)
print(table[('sum', 'Male')])
# table2 has a single level Index
table2 = df.pivot_table(index='date', columns='sex', aggfunc='sum', margins=True,
values='sum')
print(table2['Male'])
Another way to remove the sum
level would be to use table = table['sum']
,
or table.columns = table.columns.droplevel(0)
.
删除sum
级别的另一种方法是使用table = table['sum']
, 或table.columns = table.columns.droplevel(0)
。