Python - Pandas - DataFrame 减少行

Question

提问by peu ping

I have a DataFrame like so:

我有一个像这样的数据帧：

ind  col1 col2
1    12   string1  ...
2    23   string2 ...
3    34   string1 ...
4    13   string2 ...
5    17   string3 ...
...  ...  ...     ...

I want to collapse the DataFrame so that col2 will be unique. In col1 (and all the other numerical columns), I want to put the median of all the values where col2 was equal.

我想折叠 DataFrame 以便 col2 是唯一的。在 col1（和所有其他数字列）中，我想放置 col2 相等的所有值的中位数。

I know I can extract df[df["col2"] == "stringN"], calculate the medians and build a new DataFrame, but is there a more elegant/pythonic way to do this?

我知道我可以提取 df[df["col2"] == "stringN"]，计算中位数并构建一个新的 DataFrame，但是有没有更优雅/pythonic 的方法来做到这一点？

Answer 1

采纳答案by DSM

You can use groupbyto collect the rows by col2and then .median():

您可以使用groupby收集行col2，然后.median()：

>>> df
   ind  col1     col2
0    1    12  string1
1    2    23  string2
2    3    34  string1
3    4    13  string2
4    5    17  string3
>>> df.groupby("col2")
<pandas.core.groupby.DataFrameGroupBy object at 0x9f41b8c>
>>> df.groupby("col2").median()
         ind  col1
col2              
string1    2    23
string2    3    18
string3    5    17
>>> df.groupby("col2").median().reset_index()
      col2  ind  col1
0  string1    2    23
1  string2    3    18
2  string3    5    17

Note that the result has the medians of the indvalues as well. See also .mean(), .min(), .max(), or you can roll your own if you'd prefer.

请注意，结果也具有值的中位数ind。另请参阅.mean()、.min()、.max()，或者如果您愿意，也可以推出自己的产品。

Python - Pandas - DataFrame 减少行

提问by peu ping

采纳答案by DSM

相关推荐

最近更新

标签

Python - Pandas - DataFrame 减少行

提问by peu ping

采纳答案by DSM

相关推荐

pandas ValueError：缓冲区数据类型不匹配，预期为“float64_t”但得到“float”

基于 Python pandas 中索引的补充挑选元素

pandas 来自 unix utc 秒的 numpy datetime64

pandas 如何从“groupby”对象的“单元格”获取值？

相关推荐

最近更新

标签