Pandas python .describe() 格式/输出

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32835498/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:56:46  来源:igfitidea点击:

Pandas python .describe() formatting/output

pythonpandasformattingoutputdescribe

提问by Mike

I am trying to get the .describe()function to output in a reformatted way. Here is the csv data (testProp.csv)

我试图让.describe()函数以重新格式化的方式输出。这是 csv 数据 ( testProp.csv)

'name','prop'
A,1
A,2
B,  4
A,  3
B,  5
B,  2

when I type in the following:

当我输入以下内容时:

from pandas import *

data = read_csv('testProp.csv')

temp = data.groupby('name')['prop'].describe()
temp.to_csv('out.csv')

the output is:

输出是:

name       
A     count    3.000000
      mean     2.000000
      std      1.000000
      min      1.000000
      25%      1.500000
      50%      2.000000
      75%      2.500000
      max      3.000000
B     count    3.000000
      mean     3.666667
      std      1.527525
      min      2.000000
      25%      3.000000
      50%      4.000000
      75%      4.500000
      max      5.000000
dtype: float64

However, I want the data in the format below. I have tried transpose()and would like to maintain using the describe()and manipulate that instead of a .agg([np.mean(), np.max(), etc.... ):

但是,我想要以下格式的数据。我已经尝试transpose()并希望保持使用describe()并操纵它而不是a .agg([np.mean(), np.max(), etc.... )

    count   mean    std min 25% 50% 75% max
A   3   2   1   1   1.5 2   2.5 3
B    3  3.666666667 1.527525232 2   3   4   4.5 5

采纳答案by Anand S Kumar

One way to do this would be to first do .reset_index(), to reset the index for your tempDataFrame, and then use DataFrame.pivotas you want . Example -

执行此操作的一种方法是首先执行.reset_index(),重置tempDataFrame的索引,然后DataFrame.pivot根据需要使用。例子 -

In [24]: df = pd.read_csv(io.StringIO("""name,prop
   ....: A,1
   ....: A,2
   ....: B,  4
   ....: A,  3
   ....: B,  5
   ....: B,  2"""))

In [25]: temp = df.groupby('name')['prop'].describe().reset_index()

In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0)

In [27]: newdf.columns.name = ''   #This is needed so that the name of the columns is not `'level_1'` .

In [28]: newdf
Out[28]:
      25%  50%  75%  count  max      mean  min       std
name
A     1.5    2  2.5      3    3  2.000000    1  1.000000
B     3.0    4  4.5      3    5  3.666667    2  1.527525

Then you can save this newdfto csv.

然后您可以将其保存newdf到 csv。

回答by Vitalis

In pandas v0.22, you can use the unstack feature. Building on from @Kumar answer above, you can use the pandas stack/unstack feature and play around with it's variation.

在 pandas v0.22 中,您可以使用 unstack 功能。基于上面的@Kumar 回答,您可以使用 Pandas 堆栈/取消堆栈功能并尝试使用它的变体。

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""name,prop
   A,1
   A,2
   B,  4
   A,  3
   B,  5
   B,  2"""))

df.shape
df
temp = df.groupby(['name'])['prop'].describe()
temp
temp.stack() #unstack(),unstack(level=-1) level can be -1, 0

Check out the documentation pandas unstackfor more details

查看文档pandas unstack以获取更多详细信息