pandas 行子集的一列上的熊猫标准偏差

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45704810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:15:27  来源:igfitidea点击:

Pandas standard deviation on one column for subset of rows

pythonpandasstatisticsstandard-deviation

提问by Thomas

I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.

我是 Python 和 Pandas 的新手。目前我正在尝试创建一个从 SQL 数据库中提取数据并在 Pandas 数据框中使用该数据的报告。每行是服务器名称和样本日期,然后是每列的样本数据。

I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.

我已经能够使用 df[df['hostname'] == uniquehost] df 作为数据帧的变量和 uniquehost 作为每个唯一主机名的变量按主机名进行过滤。

What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()

我接下来要做的是获取其他列的 stdev,尽管我无法弄清楚这部分。我试图使用 df[df['hostname'] == uniquehost].std()

However, this wasn't correct.

然而,这是不正确的。

Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.

谁能指出我正确的方向来弄清楚这个数字?我怀疑我找错了树,可能有一种非常简单的方法来处理这个问题,但我还没有遇到过。

Hostname | Sample Date | CPU Peak | Memory Peak 
server1 | 08/08/17 | 67.32 | 34.83 
server1 | 08/09/17 | 34 | 62

采纳答案by cs95

IIUC, you'll want to first do df.groupbyon Hostnameand thenfind the standard deviation. Something like this:

IIUC,你要首先做df.groupbyHostname,并随后找到标准偏差。像这样的东西:

In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]: 
           CPU Peak  Memory Peak
Hostname                        
server1   23.560798    19.212091