Pandas 中的分层多索引计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25126692/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hierarhical Multi-index counts in Pandas
提问by Amelio Vazquez-Reina
Say I have a multi-index dataframe in Pandas, e.g:
假设我在 Pandas 中有一个多索引数据框,例如:
A B C
X Y Z
bar one a -0.007381 -0.365315 -0.024817
b -1.219794 0.370955 -0.795125
baz three a 0.145578 1.428502 -0.408384
b -0.249321 -0.292967 -1.849202
two a -0.249321 -0.292967 -1.849202
four a 0.211234 -0.967123 1.202234
foo one b -1.046479 -1.250595 0.781722
a 1.314373 0.333150 0.133331
qux one c 0.716789 0.616471 -0.298493
two b 0.385795 -0.915417 -1.367644
How can I count how many levels are contained within another level? (e.g. level Ywithin X)
我如何计算另一个级别中包含多少级别?(例如Y内的水平X)
E.g. in the case above the answer would be:
例如,在上述情况下,答案将是:
X Y
bar 1
baz 3
foo 1
qux 2
Update
更新
When I try df.groupby(level=[0, 1]).count()[0]I get:
当我尝试时,df.groupby(level=[0, 1]).count()[0]我得到:
C D E
A B
bar one 1 1 1
three 1 1 1
flux six 1 1 1
three 1 1 1
foo five 1 1 1
one 1 1 1
two 2 2 2
回答by joris
You can do the following (group by level Xand then calculate the number of unique values of Yin each group, which is easier when the index is reset):
您可以执行以下操作(按级别分组X,然后计算Y每个组中唯一值的数量,在重置索引时更容易):
In [15]: df.reset_index().groupby('X')['Y'].nunique()
Out[15]:
X
bar 1
baz 3
foo 1
qux 2
Name: Y, dtype: int64
回答by Papalagui
I think this must work as well:
我认为这也必须有效:
For level A:
对于 A 级:
df.groupby(level='A').size()
For level B:
对于 B 级:
df.groupby(level=['A','B']).size()
回答by Kuldeep
You can always add suffix to your column name and reset index after converting to dataframe.
转换为数据框后,您始终可以为列名添加后缀并重置索引。
Let's say I have pandas.series.Series object "s"
假设我有 pandas.series.Series 对象“s”
>> s = train.groupby('column_name').item_id.value_counts()
>> type(s)
pandas.core.series.Series
>> y = x.to_frame()
>> data = y.add_suffix('_Count').reset_index()
>> data.head() #It will be pandas dataframe with column updates with suffix "_Count"
I converted multi index series object to single level indexed dataframe.
我将多索引系列对象转换为单级索引数据框。

