Python - Pandas - DataFrame 减少行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15485793/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - Pandas - DataFrame reduce rows
提问by peu ping
I have a DataFrame like so:
我有一个像这样的数据帧:
ind col1 col2
1 12 string1 ...
2 23 string2 ...
3 34 string1 ...
4 13 string2 ...
5 17 string3 ...
... ... ... ...
I want to collapse the DataFrame so that col2 will be unique. In col1 (and all the other numerical columns), I want to put the median of all the values where col2 was equal.
我想折叠 DataFrame 以便 col2 是唯一的。在 col1(和所有其他数字列)中,我想放置 col2 相等的所有值的中位数。
I know I can extract df[df["col2"] == "stringN"], calculate the medians and build a new DataFrame, but is there a more elegant/pythonic way to do this?
我知道我可以提取 df[df["col2"] == "stringN"],计算中位数并构建一个新的 DataFrame,但是有没有更优雅/pythonic 的方法来做到这一点?
采纳答案by DSM
You can use groupbyto collect the rows by col2and then .median():
您可以使用groupby收集行col2,然后.median():
>>> df
ind col1 col2
0 1 12 string1
1 2 23 string2
2 3 34 string1
3 4 13 string2
4 5 17 string3
>>> df.groupby("col2")
<pandas.core.groupby.DataFrameGroupBy object at 0x9f41b8c>
>>> df.groupby("col2").median()
ind col1
col2
string1 2 23
string2 3 18
string3 5 17
>>> df.groupby("col2").median().reset_index()
col2 ind col1
0 string1 2 23
1 string2 3 18
2 string3 5 17
Note that the result has the medians of the indvalues as well. See also .mean(), .min(), .max(), or you can roll your own if you'd prefer.
请注意,结果也具有值的中位数ind。另请参阅.mean()、.min()、.max(),或者如果您愿意,也可以推出自己的产品。

