修改 Python Pandas 的输出描述

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19124148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:55:31  来源:igfitidea点击:

Modify output from Python Pandas describe

pythonpandas

提问by KHibma

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)

有没有办法省略熊猫描述的一些输出?这个命令给了我我想要的表格输出(通过 simpleDate 计算 executeTime 的计数和平均值)

df.groupby('simpleDate').executeTime.describe().unstack(1)

However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.

然而,这就是我想要的,数数和意思。我想删除 std、min、max 等......到目前为止,我只阅读了如何修改列大小。

I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate andgetting the count with a mean on executeTime.

我猜答案将是重新编写该行,而不是使用描述,但是我没有通过 simpleDate 进行分组在 executeTime 上获得平均值的计数。

I can do count by date:

我可以按日期计算:

df.groupby(['simpleDate']).size()

or executeTime by date:

或按日期执行时间:

df.groupby(['simpleDate']).mean()['executeTime'].reset_index()

But can't figure out the syntax to combine them.

但无法弄清楚将它们组合起来的语法。

My desired output:

我想要的输出:

            count  mean  
09-10-2013      8  20.523   
09-11-2013      4  21.112  
09-12-2013      3  18.531
...            ..  ...

采纳答案by Jeff

Describe returns a series, so you can just select out what you want

描述返回一个系列,所以你可以选择你想要的

In [6]: s = Series(np.random.rand(10))

In [7]: s
Out[7]: 
0    0.302041
1    0.353838
2    0.421416
3    0.174497
4    0.600932
5    0.871461
6    0.116874
7    0.233738
8    0.859147
9    0.145515
dtype: float64

In [8]: s.describe()
Out[8]: 
count    10.000000
mean      0.407946
std       0.280562
min       0.116874
25%       0.189307
50%       0.327940
75%       0.556053
max       0.871461
dtype: float64

In [9]: s.describe()[['count','mean']]
Out[9]: 
count    10.000000
mean      0.407946
dtype: float64

回答by Rafa

.describe()attribute generates a dataframe where count,std,max... are values of the index, so according to the documentationyou should use, for example:

.describe()属性生成一个数据帧,其中 count,std,max... 是索引的值,因此根据您应该使用的文档,例如:

df.describe().loc[['count','max']]

回答by st19297

The solution @Jeff provided just works for series.

@Jeff 提供的解决方案仅适用于系列。

@Rafa is on the point: df.describe().info()reveals that the resulting dataframe has Index: 8 entries, count to max

@Rafa 说到点子上了:df.describe().info()揭示结果数据帧有Index: 8 entries, count to max

df.describe().loc[['count','max']]does work, but df.groupby('simpleDate').describe().loc[['count','max']], which is what the OP asked, does not work.

df.describe().loc[['count','max']]确实有效,但是df.groupby('simpleDate').describe().loc[['count','max']],这是 OP 所要求的,不起作用。

I think a solution may be this:

我认为一个解决方案可能是这样的:

df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
                    'Z': [10, 5, 6, 11, 12],
                                        })

grouping the df by Y:

将 df 分组为Y

df_grouped=df.groupby(by='Y')     


In [207]df_grouped.agg([np.mean, len])

Out[207]: 
        Z    
     mean len
Y            
A  10.500   2
B   7.667   3

回答by Geoff Counihan

Sticking with describe, you can unstack the indexes and then slice normally too

坚持使用describe,您可以拆开索引,然后也可以正常切片

df.describe().unstack()[['count','max']]

df.describe().unstack()[['count','max']]