pandas 使用 python {census} 计算每个州的县数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40523185/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:24:38  来源:igfitidea点击:

Count number of counties per state using python {census}

pythonpandasdatasetdata-science

提问by Bakhtawar

I am troubling with counting the number of counties using famous cenus.csvdata.

我对使用著名的cenus.csv数据计算县的数量感到困扰。

Task: Count number of counties in each state.

任务:计算每个州的县数。

Facing comparing (I think) / Please read below?

面对比较(我认为)/请阅读以下内容?

I've tried this:

我试过这个:

df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique()  //Gives out names of state

serr = pd.Series(dfd)  // converting to series (from array)

After this, i've tried using two approaches:

在此之后,我尝试使用两种方法:

1:

1:

    df[df['STNAME'] == serr] **//ERROR: series length must match**

2:

2:

i = 0
for name in serr:                        //This generate error 'Alabama'
    df['STNAME'] == name
    for i in serr:
        serr[i] == serr[name]
        print(serr[name].count)
        i+=1

Please guide me; it has been three days with this stuff.

请指导我;这东西已经三天了。

回答by juanpa.arrivillaga

Use groupbyand aggregate COUNTYusing nunique:

使用groupby和聚合COUNTY使用nunique

In [1]: import pandas as pd

In [2]: df = pd.read_csv('census.csv')

In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()

Now the results

现在的结果

In [4]: unique_counties
Out[4]: 
STNAME
Alabama                  68
Alaska                   30
Arizona                  16
Arkansas                 76
California               59
Colorado                 65
Connecticut               9
Delaware                  4
District of Columbia      2
Florida                  68
Georgia                 160
Hawaii                    6
Idaho                    45
Illinois                103
Indiana                  93
Iowa                    100
Kansas                  106
Kentucky                121
Louisiana                65
Maine                    17
Maryland                 25
Massachusetts            15
Michigan                 84
Minnesota                88
Mississippi              83
Missouri                116
Montana                  57
Nebraska                 94
Nevada                   18
New Hampshire            11
New Jersey               22
New Mexico               34
New York                 63
North Carolina          101
North Dakota             54
Ohio                     89
Oklahoma                 78
Oregon                   37
Pennsylvania             68
Rhode Island              6
South Carolina           47
South Dakota             67
Tennessee                96
Texas                   255
Utah                     30
Vermont                  15
Virginia                134
Washington               40
West Virginia            56
Wisconsin                73
Wyoming                  24
Name: COUNTY, dtype: int64

回答by Dong Du

juanpa.arrivillagahas a great solution. However, the code needs a minor modification.

juanpa.arrivillaga有一个很好的解决方案。但是,代码需要稍作修改。

The "counties" with 'SUMLEV' == 40or 'COUNTY' == 0should be filtered. Otherwise, all the number of counties are too big by one.

带有'SUMLEV' == 40或的“县”'COUNTY' == 0应该被过滤掉。否则,所有的县数都大了一个。

So, the correct answer should be:

所以,正确答案应该是:

unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()

with the following result:

结果如下:

STNAME
Alabama                  67
Alaska                   29
Arizona                  15
Arkansas                 75
California               58
Colorado                 64
Connecticut               8
Delaware                  3
District of Columbia      1
Florida                  67
Georgia                 159
Hawaii                    5
Idaho                    44
Illinois                102
Indiana                  92
Iowa                     99
Kansas                  105
Kentucky                120
Louisiana                64
Maine                    16
Maryland                 24
Massachusetts            14
Michigan                 83
Minnesota                87
Mississippi              82
Missouri                115
Montana                  56
Nebraska                 93
Nevada                   17
New Hampshire            10
New Jersey               21
New Mexico               33
New York                 62
North Carolina          100
North Dakota             53
Ohio                     88
Oklahoma                 77
Oregon                   36
Pennsylvania             67
Rhode Island              5
South Carolina           46
South Dakota             66
Tennessee                95
Texas                   254
Utah                     29
Vermont                  14
Virginia                133
Washington               39
West Virginia            55
Wisconsin                72
Wyoming                  23
Name: COUNTY, dtype: int64

回答by MEdwin

@Bakhtawar - This is a very simple way:

@Bakhtawar - 这是一个非常简单的方法:

df.groupby(df['STNAME']).count().COUNTY

回答by Subhankar Nayak

Layman logic without using 'groupby':

不使用“groupby”的外行逻辑:

import pandas as pd
census_df= pd.read_csv('census.csv')
cdf= census_df.copy()
cdf= cdf[cdf['SUMLEV'] == 50]
ind= cdf['STNAME'].unique()
m=0
for i in ind:
    c= len(cdf[cdf['STNAME'] == i])
    if c>m:
        m= c
        state= i
print (state)

Result: 'Texas'

结果:'德克萨斯'