Pandas 函数操作

Question

提问by Vivek

Data is from the United States Census Bureau. Counties are political and geographic subdivisions of states in the United States. This dataset contains population data for counties and states in the US from 2010 to 2015.

数据来自美国人口普查局。县是美国各州的和地理分区。该数据集包含 2010 年至 2015 年美国各县和州的人口数据。

Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)

哪个州的县最多？（提示：仔细考虑 sumlevel 键！您将来的问题也需要它......）

I can not fetch the county name out of the code. Please help

我无法从代码中提取县名。请帮忙

my code:

我的代码：

import pandas as pd
import numpy as np
census_df = pd.read_csv('census.csv')
census_df.head()
def answer_five():
    return census_df.groupby('STNAME').COUNTY.sum().max()



answer_five()

Answer 1

回答by dfadeeff

Here is the answer that worked for me:

这是对我有用的答案：

def answer_five():
    return census_df.groupby(["STNAME"],sort=False).sum()["COUNTY"].idxmax()

First part created aggregated df

第一部分创建聚合 df

census_df.groupby(["STNAME"],sort=False).sum()

Second part takes the col you need

第二部分需要你需要的 col

["COUNTY"].idxmax()

and returns value corresponding to index with max, check here

并返回与具有最大值的索引对应的值，请在此处查看

Answer 2

回答by jasonlcy91

Just a correction to your entire code.

只是对整个代码的更正。

First, according to the source, SUMLEVof 50 means the row is a county. Two ways to answer this.

首先，根据消息来源，SUMLEV50 表示该行是一个县。两种方式来回答这个问题。

Thought process (think of it like in Excel): You want to count the number of "county rows" in each state group. First, you create the mask/condition to select all SUMLEV == 50("county rows"). Then group them by STNAME. Then use .size()to count the number of rows in each grouping.

思考过程（想像在 Excel 中）：您想计算每个州组中“县行”的数量。首先，您创建掩码/条件以选择所有SUMLEV == 50（“县行”）。然后将它们分组STNAME。然后用于.size()计算每个分组中的行数。

# this is it!
def answer_five():
    mask = (census_df.SUMLEV == 50)
    max_index = census_df[mask].groupby('STNAME').size().idxmax()
    return max_index

# not so elegant
def answer_five():
    census_df['Counts'] = 1
    mask = (census_df.SUMLEV == 50)
    max_index = census_df[mask].groupby('STNAME')['Counts'].sum().idxmax()
    return max_index

You are welcome. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html

不客气。https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html

Answer 3

回答by Silvis Sora

Actually you can just count the number in states level instead of looking into County details.

实际上，您可以只计算州级别的数量，而无需查看县的详细信息。

And this should work:

这应该有效：

census_df[census_df['SUMLEV']==50].groupby(['STNAME']).size().idxmax()

Answer 4

回答by Aishwarya Kanchan

def answer_five():
    new_df = census_df[census_df['SUMLEV'] == 50]
    x = new_df.groupby('STNAME')
    return x.count()['COUNTY'].idxmax()


answer_five()

Answer 5

回答by Anand Krishnan

We can also do this question using sum() function

我们也可以使用 sum() 函数来做这个问题

def answer_five():

定义答案_五（）：

return  census_df.groupby(["STNAME"]).sum()["COUNTY"].idxmax()

Using sum () it will sum up all the values in COUNTY column from which we can apply idxmax() to find the the state which has the highest no:of counties.

使用 sum() 它将对 COUNTY 列中的所有值求和，我们可以从中应用 idxmax() 找到具有最高县数的州。

Answer 6

回答by yogs


def answer_five():
    county = census_df[census_df['SUMLEV']==50]
    county = county.groupby(['STNAME']).count()

    return county['SUMLEV'].idxmax(axis=0)

answer_five()

Answer 7

回答by Nathan

It's the change from .max()to idxmax()that returns the correct value for the STNAMErather than a large integer.

这是从.max()到idxmax()返回正确值STNAME而不是大整数的更改。

Answer 8

回答by Terk

def answer_five():
    return census_df.groupby('STNAME')['CTYNAME'].count().idxmax()

Pandas 函数操作

提问by Vivek

回答by dfadeeff

回答by jasonlcy91

回答by Silvis Sora

回答by Aishwarya Kanchan

回答by Anand Krishnan

回答by yogs

回答by Nathan

回答by Terk

相关推荐

最近更新

标签

Pandas 函数操作

提问by Vivek

回答by dfadeeff

回答by jasonlcy91

回答by Silvis Sora

回答by Aishwarya Kanchan

回答by Anand Krishnan

回答by yogs

回答by Nathan

回答by Terk

相关推荐

pandas DataFrameGroupBy 对象的计算模式时出错

自加入 Pandas

pandas 如何为所选列选择一行中的最大值和最小值

在没有 elasticsearch-py 的情况下将 Pandas 数据帧索引到 Elasticsearch

相关推荐

最近更新

标签