pandas 如何将函数应用于适当的数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28661258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to apply function to dataframe in place
提问by hlin117
Is there a way I could use a scipy function like norm.cdfin placeon a numpy.array(or pandas.DataFrame), using a variant of numpy.apply, numpy.apply_along_axs, etc?
有没有一种方法,我可以使用SciPy的功能就像在地方上一个(或)使用的变体,等等?norm.cdfnumpy.arraypandas.DataFramenumpy.applynumpy.apply_along_axs
The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdffrom scipyfor this.
背景是,我有一个 z-score 值表,我想将其转换为范数分布的 CDF 值。我目前正在为此使用norm.cdffrom scipy。
I'm currently manipulating a dataframe that has non-numeric values.
我目前正在操作一个具有非数字值的数据框。
Name Val1 Val2 Val3 Val4
0 A -1.540369 -0.077779 0.979606 -0.667112
1 B -0.787154 0.048412 0.775444 -0.510904
2 C -0.477234 0.414388 1.250544 -0.411658
3 D -1.430851 0.258759 1.247752 -0.883293
4 E -0.360181 0.485465 1.123589 -0.379157
(Making the Namevariable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)
(使Name变量成为索引是一种解决方案,但在我的实际数据集中,名称不是字母字符。)
To modify only the numeric data, I'm using df._get_numeric_data()a private function that returns a dataframe containing a dataframe's numeric data. However, there is no setfunction. Hence, if I call
为了仅修改数字数据,我使用df._get_numeric_data()了一个私有函数,该函数返回一个包含数据帧数字数据的数据帧。但是,没有set功能。因此,如果我打电话
norm.cdf(df._get_numeric_data)
this won't change df's original data.
这不会改变df的原始数据。
I'm trying to circumvent this by applying norm.cdfto the numeric dataframe inplace, so this changes my original dataset.
我试图通过应用norm.cdf到数字数据框就地来规避这一点,所以这会改变我的原始数据集。
回答by Andy Hayden
I think I would prefer select_dtypesover _get_numeric_data:
我想,我宁愿select_dtypes过_get_numeric_data:
In [11]: df.select_dtypes(include=[np.number])
Out[11]:
Val1 Val2 Val3 Val4
0 -1.540369 -0.077779 0.979606 -0.667112
1 -0.787154 0.048412 0.775444 -0.510904
2 -0.477234 0.414388 1.250544 -0.411658
3 -1.430851 0.258759 1.247752 -0.883293
4 -0.360181 0.485465 1.123589 -0.379157
Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):
尽管 apply 不提供就地,但您可以执行以下操作(无论如何我认为这更明确):
num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)

