Python 分组并找到前 n 个 value_counts 熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35364601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by and find top n value_counts pandas
提问by ytk
I have a dataframe of taxi data with two columns that looks like this:
我有一个包含两列的出租车数据数据框,如下所示:
Neighborhood Borough Time
Midtown Manhattan X
Melrose Bronx Y
Grant City Staten Island Z
Midtown Manhattan A
Lincoln Square Manhattan B
Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:
基本上,每一行代表该行政区该街区附近的一辆出租车。现在,我想找到每个行政区中接送次数最多的前 5 个街区。我试过这个:
df['Neighborhood'].groupby(df['Borough']).value_counts()
Which gives me something like this:
这给了我这样的东西:
borough
Bronx High Bridge 3424
Mott Haven 2515
Concourse Village 1443
Port Morris 1153
Melrose 492
North Riverdale 463
Eastchester 434
Concourse 395
Fordham 252
Wakefield 214
Kingsbridge 212
Mount Hope 200
Parkchester 191
......
Staten Island Castleton Corners 4
Dongan Hills 4
Eltingville 4
Graniteville 4
Great Kills 4
Castleton 3
Woodrow 1
How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.
如何过滤它以便我只从每个中获得前 5 个?我知道有几个问题具有类似的标题,但它们对我的案例没有帮助。
采纳答案by jezrael
I think you can use nlargest
- you can change 1
to 5
:
我认为您可以使用nlargest
- 您可以更改1
为5
:
s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough
Bronx Melrose 7
Manhattan Midtown 12
Lincoln Square 2
Staten Island Grant City 11
dtype: int64
print s.groupby(level=[0,1]).nlargest(1)
Bronx Bronx Melrose 7
Manhattan Manhattan Midtown 12
Staten Island Staten Island Grant City 11
dtype: int64
additional columns were getting created, specified level info
正在创建附加列,指定级别信息
回答by Alexander
You can do this in a single line by slightly extending your original groupby with 'nlargest':
您可以通过使用“nlargest”稍微扩展原始组来在一行中完成此操作:
>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough Neighborhood Neighborhood
Bronx Melrose Melrose 1
Manhattan Midtown Midtown 1
Manhatten Lincoln Square Lincoln Square 1
Midtown Midtown 1
Staten Island Grant City Grant City 1
dtype: int64
回答by Khadijah Lawal
df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)
df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)
head() gets the top 5 rows in a data frame.
head() 获取数据框中的前 5 行。