Pandas 函数操作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41550187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas function operations
提问by Vivek
Data is from the United States Census Bureau. Counties are political and geographic subdivisions of states in the United States. This dataset contains population data for counties and states in the US from 2010 to 2015.
数据来自美国人口普查局。县是美国各州的和地理分区。该数据集包含 2010 年至 2015 年美国各县和州的人口数据。
Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)
哪个州的县最多?(提示:仔细考虑 sumlevel 键!您将来的问题也需要它......)
I can not fetch the county name out of the code. Please help
我无法从代码中提取县名。请帮忙
my code:
我的代码:
import pandas as pd
import numpy as np
census_df = pd.read_csv('census.csv')
census_df.head()
def answer_five():
return census_df.groupby('STNAME').COUNTY.sum().max()
answer_five()
回答by dfadeeff
Here is the answer that worked for me:
这是对我有用的答案:
def answer_five():
return census_df.groupby(["STNAME"],sort=False).sum()["COUNTY"].idxmax()
First part created aggregated df
第一部分创建聚合 df
census_df.groupby(["STNAME"],sort=False).sum()
Second part takes the col you need
第二部分需要你需要的 col
["COUNTY"].idxmax()
and returns value corresponding to index with max, check here
并返回与具有最大值的索引对应的值,请在此处查看
回答by jasonlcy91
Just a correction to your entire code.
只是对整个代码的更正。
First, according to the source, SUMLEV
of 50 means the row is a county. Two ways to answer this.
首先,根据消息来源,SUMLEV
50 表示该行是一个县。两种方式来回答这个问题。
Thought process (think of it like in Excel):
You want to count the number of "county rows" in each state group.
First, you create the mask/condition to select all SUMLEV == 50
("county rows").
Then group them by STNAME
.
Then use .size()
to count the number of rows in each grouping.
思考过程(想像在 Excel 中):您想计算每个州组中“县行”的数量。首先,您创建掩码/条件以选择所有SUMLEV == 50
(“县行”)。然后将它们分组STNAME
。然后用于.size()
计算每个分组中的行数。
# this is it!
def answer_five():
mask = (census_df.SUMLEV == 50)
max_index = census_df[mask].groupby('STNAME').size().idxmax()
return max_index
# not so elegant
def answer_five():
census_df['Counts'] = 1
mask = (census_df.SUMLEV == 50)
max_index = census_df[mask].groupby('STNAME')['Counts'].sum().idxmax()
return max_index
You are welcome. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html
不客气。https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html
回答by Silvis Sora
Actually you can just count the number in states level instead of looking into County details.
实际上,您可以只计算州级别的数量,而无需查看县的详细信息。
And this should work:
这应该有效:
census_df[census_df['SUMLEV']==50].groupby(['STNAME']).size().idxmax()
回答by Aishwarya Kanchan
def answer_five():
new_df = census_df[census_df['SUMLEV'] == 50]
x = new_df.groupby('STNAME')
return x.count()['COUNTY'].idxmax()
answer_five()
回答by Anand Krishnan
We can also do this question using sum() function
我们也可以使用 sum() 函数来做这个问题
def answer_five():
定义答案_五():
return census_df.groupby(["STNAME"]).sum()["COUNTY"].idxmax()
Using sum () it will sum up all the values in COUNTY column from which we can apply idxmax() to find the the state which has the highest no:of counties.
使用 sum() 它将对 COUNTY 列中的所有值求和,我们可以从中应用 idxmax() 找到具有最高县数的州。
回答by yogs
def answer_five():
county = census_df[census_df['SUMLEV']==50]
county = county.groupby(['STNAME']).count()
return county['SUMLEV'].idxmax(axis=0)
answer_five()
回答by Nathan
It's the change from .max()
to idxmax()
that returns the correct value for the STNAME
rather than a large integer.
这是从.max()
到idxmax()
返回正确值STNAME
而不是大整数的更改。
回答by Terk
def answer_five():
return census_df.groupby('STNAME')['CTYNAME'].count().idxmax()