我如何在数据集上使用 Pandas 找到中位数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13063259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How I do find median using pandas on a dataset?
提问by Vinayak
I have dataframe data which has 3 columns - Date, segment and metric. I am doing the following:
我有 3 列的数据框数据 - 日期、段和指标。我正在做以下事情:
data = pandas.read_csv("Filename.csv")
ave = data.groupby('Segment').mean() #works
ave = data.groupby('Segment').median() #gives error
ave['median'] = data.groupby('Segment').median()
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1453, in __setitem__
self._set_item(key, value)
File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1488, in _set_item
NDFrame._set_item(self, key, value)
File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 301, in _set_item
self._data.set(key, value)
File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 616, in set
assert(value.shape[1:] == self.shape[1:])
AssertionError
回答by Rutger Kassies
What error do you get with?
你有什么错误?
ave = data.groupby('Segment').median()
I think that should work, maybe there's something in your data causing the error, like nan's, im just guessing. You could try applying your own median function to see if you can work around the cause of the error, something like:
我认为这应该可行,也许您的数据中存在某些导致错误的内容,例如 nan,我只是在猜测。您可以尝试应用自己的中值函数来查看是否可以解决错误的原因,例如:
def mymed(group):
return np.median(group.dropna())
ave = data.groupby('segment')['Metric'].apply(mymed)
It would be easier if you could provide some sample data which replicates the error.
如果您可以提供一些复制错误的示例数据,那就更容易了。
Here is a different approach, you can add the median back to your original dataframe, the median for the metric column becomes:
这是一种不同的方法,您可以将中位数添加回原始数据框,度量列的中位数变为:
data['metric_median'] = data.groupby('Segment')['Metric'].transform('median')
Wether its useful to have the median of the group attached to each datapoint depends a bit what you want to do afterwards.
将组的中位数附加到每个数据点是否有用取决于您之后想要做什么。
回答by Deependra Mishra
I think we can calculate the median using the following code.
我认为我们可以使用以下代码计算中位数。
print(data['segment'].median())
if it doesn't work we may try putting the average value in place of missing data and then calculating the median.
如果它不起作用,我们可以尝试用平均值代替缺失的数据,然后计算中位数。

