pandas 计算熊猫数据框中每一行的百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31481803/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compute percentage for each row in pandas dataframe
提问by user308827
country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
y191 y192 y193 y194 y195 \
47052179 43361966 42736682 43196916 41751928
1187385 1201557 1172941 1176366 1192173
28211467 27668273 29742374 27543836 28104317
179000 193000 233338 276639 249688
12613922 12864425 13240395 14106139 15642337
In the data frame above, I would like to compute for each row, the percentage of the total occupied by that val_code, resulting in foll. data frame.
在上面的数据框中,我想为每一行计算该 val_code 占用的总数的百分比,结果如下。数据框。
I.e. Sum up each row and divide by total of all rows
即总结每一行并除以所有行的总数
country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
perc
50.14947129
1.363631254
32.48344744
0.260213146
15.74323688
Right now, I am doing this, but it is not working
现在,我正在这样做,但它不起作用
grp_df = df.groupby(['country_name', 'val_code']).agg()
pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))
采纳答案by EdChum
Ge the total for all the columns of interest and then add the percentage column:
获取所有感兴趣的列的总数,然后添加百分比列:
In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df
Out[35]:
country_name country_code val_code y191 y192 \
0 United States of America 231 1 47052179 43361966
1 United States of America 231 1 1187385 1201557
2 United States of America 231 1 28211467 27668273
3 United States of America 231 1 179000 193000
4 United States of America 231 1 12613922 12864425
y193 y194 y195 percent
0 42736682 43196916 41751928 50.149471
1 1172941 1176366 1192173 1.363631
2 29742374 27543836 28104317 32.483447
3 233338 276639 249688 0.260213
4 13240395 14106139 15642337 15.743237
So np.sumwill sum all the values:
因此np.sum将对所有值求和:
In [32]:
total = np.sum(df.ix[:,'y191':].values)
total
Out[32]:
434899243
We then call .sum(axis=1)/total * 100on the cols of interest to sum row-wise, divide by the total and multiply by 100 to get a percentage.
然后我们调用.sum(axis=1)/total * 100感兴趣的列按行求和,除以总数,再乘以 100 得到一个百分比。
回答by Alexander
You can get the percentages of each column using a lambdafunction as follows:
您可以使用lambda如下函数获取每列的百分比:
>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())
y191 y192 y193 y194 y195
0 0.527231 0.508411 0.490517 0.500544 0.480236
1 0.013305 0.014088 0.013463 0.013631 0.013713
2 0.316116 0.324405 0.341373 0.319164 0.323259
3 0.002006 0.002263 0.002678 0.003206 0.002872
4 0.141342 0.150833 0.151969 0.163455 0.179920
Your example does not have any duplicate values for val_code, so I'm unsure how you want your data to appear (i.e. show percent of total in column vs. total for each vval_codegroup.)
您的示例没有任何重复值val_code,所以我不确定您希望数据如何显示(即显示列中的总数百分比与每个 vval_code组的总数。)

