我如何在数据集上使用 Pandas 找到中位数？

Question

提问by Vinayak

I have dataframe data which has 3 columns - Date, segment and metric. I am doing the following:

我有 3 列的数据框数据 - 日期、段和指标。我正在做以下事情：

data = pandas.read_csv("Filename.csv")
ave = data.groupby('Segment').mean() #works
ave = data.groupby('Segment').median() #gives error
ave['median'] = data.groupby('Segment').median()

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1453, in __setitem__
    self._set_item(key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1488, in _set_item
    NDFrame._set_item(self, key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 301, in _set_item
    self._data.set(key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 616, in set
    assert(value.shape[1:] == self.shape[1:])
AssertionError

Answer 1

回答by Rutger Kassies

What error do you get with?

你有什么错误？

ave = data.groupby('Segment').median()

I think that should work, maybe there's something in your data causing the error, like nan's, im just guessing. You could try applying your own median function to see if you can work around the cause of the error, something like:

我认为这应该可行，也许您的数据中存在某些导致错误的内容，例如 nan，我只是在猜测。您可以尝试应用自己的中值函数来查看是否可以解决错误的原因，例如：

def mymed(group):
    return np.median(group.dropna())

ave = data.groupby('segment')['Metric'].apply(mymed)

It would be easier if you could provide some sample data which replicates the error.

如果您可以提供一些复制错误的示例数据，那就更容易了。

Here is a different approach, you can add the median back to your original dataframe, the median for the metric column becomes:

这是一种不同的方法，您可以将中位数添加回原始数据框，度量列的中位数变为：

data['metric_median'] = data.groupby('Segment')['Metric'].transform('median')

Wether its useful to have the median of the group attached to each datapoint depends a bit what you want to do afterwards.

将组的中位数附加到每个数据点是否有用取决于您之后想要做什么。

Answer 2

回答by Deependra Mishra

I think we can calculate the median using the following code.

我认为我们可以使用以下代码计算中位数。

print(data['segment'].median())

if it doesn't work we may try putting the average value in place of missing data and then calculating the median.

如果它不起作用，我们可以尝试用平均值代替缺失的数据，然后计算中位数。

我如何在数据集上使用 Pandas 找到中位数？

提问by Vinayak

回答by Rutger Kassies

回答by Deependra Mishra

相关推荐

最近更新

标签

我如何在数据集上使用 Pandas 找到中位数？

提问by Vinayak

回答by Rutger Kassies

回答by Deependra Mishra

相关推荐

使用日期时间绘制切片的 Pandas 数据框时出现 KeyError

pandas 根据行中的值对熊猫数据框的列进行排序

如何在不添加额外索引的情况下使用 Pandas groupby apply()

从 Pandas 数据框中过滤只有零的列

相关推荐

最近更新

标签