Python 分组并找到前 n 个 value_counts 熊猫

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35364601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:21:00  来源:igfitidea点击:

Group by and find top n value_counts pandas

pythonpandasdataframe

提问by ytk

I have a dataframe of taxi data with two columns that looks like this:

我有一个包含两列的出租车数据数据框,如下所示:

Neighborhood    Borough        Time
Midtown         Manhattan      X
Melrose         Bronx          Y
Grant City      Staten Island  Z
Midtown         Manhattan      A
Lincoln Square  Manhattan      B

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

基本上,每一行代表该行政区该街区附近的一辆出租车。现在,我想找到每个行政区中接送次数最多的前 5 个街区。我试过这个:

df['Neighborhood'].groupby(df['Borough']).value_counts()

Which gives me something like this:

这给了我这样的东西:

borough                          
Bronx          High  Bridge          3424
               Mott Haven            2515
               Concourse Village     1443
               Port Morris           1153
               Melrose                492
               North Riverdale        463
               Eastchester            434
               Concourse              395
               Fordham                252
               Wakefield              214
               Kingsbridge            212
               Mount Hope             200
               Parkchester            191
......

Staten Island  Castleton Corners        4
               Dongan Hills             4
               Eltingville              4
               Graniteville             4
               Great Kills              4
               Castleton                3
               Woodrow                  1

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.

如何过滤它以便我只从每个中获得前 5 个?我知道有几个问题具有类似的标题,但它们对我的案例没有帮助。

采纳答案by jezrael

I think you can use nlargest- you can change 1to 5:

我认为您可以使用nlargest- 您可以更改15

s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough                      
Bronx          Melrose            7
Manhattan      Midtown           12
               Lincoln Square     2
Staten Island  Grant City        11
dtype: int64

print s.groupby(level=[0,1]).nlargest(1)
Bronx          Bronx          Melrose        7
Manhattan      Manhattan      Midtown       12
Staten Island  Staten Island  Grant City    11
dtype: int64

additional columns were getting created, specified level info

正在创建附加列,指定级别信息

回答by Alexander

You can do this in a single line by slightly extending your original groupby with 'nlargest':

您可以通过使用“nlargest”稍微扩展原始组来在一行中完成此操作:

>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough        Neighborhood    Neighborhood  
Bronx          Melrose         Melrose           1
Manhattan      Midtown         Midtown           1
Manhatten      Lincoln Square  Lincoln Square    1
               Midtown         Midtown           1
Staten Island  Grant City      Grant City        1
dtype: int64

回答by Khadijah Lawal

df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)

df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)

head() gets the top 5 rows in a data frame.

head() 获取数据框中的前 5 行。