pandas 使用 python {census} 计算每个州的县数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40523185/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count number of counties per state using python {census}
提问by Bakhtawar
I am troubling with counting the number of counties using famous cenus.csvdata.
我对使用著名的cenus.csv数据计算县的数量感到困扰。
Task: Count number of counties in each state.
任务:计算每个州的县数。
Facing comparing (I think) / Please read below?
面对比较(我认为)/请阅读以下内容?
I've tried this:
我试过这个:
df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique() //Gives out names of state
serr = pd.Series(dfd) // converting to series (from array)
After this, i've tried using two approaches:
在此之后,我尝试使用两种方法:
1:
1:
df[df['STNAME'] == serr] **//ERROR: series length must match**
2:
2:
i = 0
for name in serr: //This generate error 'Alabama'
df['STNAME'] == name
for i in serr:
serr[i] == serr[name]
print(serr[name].count)
i+=1
Please guide me; it has been three days with this stuff.
请指导我;这东西已经三天了。
回答by juanpa.arrivillaga
Use groupby
and aggregate COUNTY
using nunique
:
使用groupby
和聚合COUNTY
使用nunique
:
In [1]: import pandas as pd
In [2]: df = pd.read_csv('census.csv')
In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()
Now the results
现在的结果
In [4]: unique_counties
Out[4]:
STNAME
Alabama 68
Alaska 30
Arizona 16
Arkansas 76
California 59
Colorado 65
Connecticut 9
Delaware 4
District of Columbia 2
Florida 68
Georgia 160
Hawaii 6
Idaho 45
Illinois 103
Indiana 93
Iowa 100
Kansas 106
Kentucky 121
Louisiana 65
Maine 17
Maryland 25
Massachusetts 15
Michigan 84
Minnesota 88
Mississippi 83
Missouri 116
Montana 57
Nebraska 94
Nevada 18
New Hampshire 11
New Jersey 22
New Mexico 34
New York 63
North Carolina 101
North Dakota 54
Ohio 89
Oklahoma 78
Oregon 37
Pennsylvania 68
Rhode Island 6
South Carolina 47
South Dakota 67
Tennessee 96
Texas 255
Utah 30
Vermont 15
Virginia 134
Washington 40
West Virginia 56
Wisconsin 73
Wyoming 24
Name: COUNTY, dtype: int64
回答by Dong Du
juanpa.arrivillagahas a great solution. However, the code needs a minor modification.
juanpa.arrivillaga有一个很好的解决方案。但是,代码需要稍作修改。
The "counties" with 'SUMLEV' == 40
or 'COUNTY' == 0
should be filtered. Otherwise, all the number of counties are too big by one.
带有'SUMLEV' == 40
或的“县”'COUNTY' == 0
应该被过滤掉。否则,所有的县数都大了一个。
So, the correct answer should be:
所以,正确答案应该是:
unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()
with the following result:
结果如下:
STNAME
Alabama 67
Alaska 29
Arizona 15
Arkansas 75
California 58
Colorado 64
Connecticut 8
Delaware 3
District of Columbia 1
Florida 67
Georgia 159
Hawaii 5
Idaho 44
Illinois 102
Indiana 92
Iowa 99
Kansas 105
Kentucky 120
Louisiana 64
Maine 16
Maryland 24
Massachusetts 14
Michigan 83
Minnesota 87
Mississippi 82
Missouri 115
Montana 56
Nebraska 93
Nevada 17
New Hampshire 10
New Jersey 21
New Mexico 33
New York 62
North Carolina 100
North Dakota 53
Ohio 88
Oklahoma 77
Oregon 36
Pennsylvania 67
Rhode Island 5
South Carolina 46
South Dakota 66
Tennessee 95
Texas 254
Utah 29
Vermont 14
Virginia 133
Washington 39
West Virginia 55
Wisconsin 72
Wyoming 23
Name: COUNTY, dtype: int64
回答by MEdwin
@Bakhtawar - This is a very simple way:
@Bakhtawar - 这是一个非常简单的方法:
df.groupby(df['STNAME']).count().COUNTY
回答by Subhankar Nayak
Layman logic without using 'groupby':
不使用“groupby”的外行逻辑:
import pandas as pd
census_df= pd.read_csv('census.csv')
cdf= census_df.copy()
cdf= cdf[cdf['SUMLEV'] == 50]
ind= cdf['STNAME'].unique()
m=0
for i in ind:
c= len(cdf[cdf['STNAME'] == i])
if c>m:
m= c
state= i
print (state)
Result: 'Texas'
结果:'德克萨斯'