pandas 行子集的一列上的熊猫标准偏差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45704810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas standard deviation on one column for subset of rows
提问by Thomas
I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.
我是 Python 和 Pandas 的新手。目前我正在尝试创建一个从 SQL 数据库中提取数据并在 Pandas 数据框中使用该数据的报告。每行是服务器名称和样本日期,然后是每列的样本数据。
I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.
我已经能够使用 df[df['hostname'] == uniquehost] df 作为数据帧的变量和 uniquehost 作为每个唯一主机名的变量按主机名进行过滤。
What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()
我接下来要做的是获取其他列的 stdev,尽管我无法弄清楚这部分。我试图使用 df[df['hostname'] == uniquehost].std()
However, this wasn't correct.
然而,这是不正确的。
Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.
谁能指出我正确的方向来弄清楚这个数字?我怀疑我找错了树,可能有一种非常简单的方法来处理这个问题,但我还没有遇到过。
Hostname | Sample Date | CPU Peak | Memory Peak
server1 | 08/08/17 | 67.32 | 34.83
server1 | 08/09/17 | 34 | 62
采纳答案by cs95
IIUC, you'll want to first do df.groupby
on Hostname
and thenfind the standard deviation. Something like this:
IIUC,你要首先做df.groupby
的Hostname
,并随后找到标准偏差。像这样的东西:
In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]:
CPU Peak Memory Peak
Hostname
server1 23.560798 19.212091