pandas 如何将函数应用于适当的数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28661258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:58:28  来源:igfitidea点击:

How to apply function to dataframe in place

pythonpandasscipyvectorization

提问by hlin117

Is there a way I could use a scipy function like norm.cdfin placeon a numpy.array(or pandas.DataFrame), using a variant of numpy.apply, numpy.apply_along_axs, etc?

有没有一种方法,我可以使用SciPy的功能就像在地方上一个(或)使用的变体,等等?norm.cdfnumpy.arraypandas.DataFramenumpy.applynumpy.apply_along_axs



The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdffrom scipyfor this.

背景是,我有一个 z-score 值表,我想将其转换为范数分布的 CDF 值。我目前正在为此使用norm.cdffrom scipy

I'm currently manipulating a dataframe that has non-numeric values.

我目前正在操作一个具有非数字值的数据框。

      Name      Val1      Val2      Val3      Val4 
0        A -1.540369 -0.077779  0.979606 -0.667112   
1        B -0.787154  0.048412  0.775444 -0.510904   
2        C -0.477234  0.414388  1.250544 -0.411658   
3        D -1.430851  0.258759  1.247752 -0.883293   
4        E -0.360181  0.485465  1.123589 -0.379157

(Making the Namevariable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)

(使Name变量成为索引是一种解决方案,但在我的实际数据集中,名称不是字母字符。)

To modify only the numeric data, I'm using df._get_numeric_data()a private function that returns a dataframe containing a dataframe's numeric data. However, there is no setfunction. Hence, if I call

为了仅修改数字数据,我使用df._get_numeric_data()了一个私有函数,该函数返回一个包含数据帧数字数据的数据帧。但是,没有set功能。因此,如果我打电话

norm.cdf(df._get_numeric_data)

this won't change df's original data.

这不会改变df的原始数据。

I'm trying to circumvent this by applying norm.cdfto the numeric dataframe inplace, so this changes my original dataset.

我试图通过应用norm.cdf到数字数据框就地来规避这一点,所以这会改变我的原始数据集。

回答by Andy Hayden

I think I would prefer select_dtypesover _get_numeric_data:

我想,我宁愿select_dtypes_get_numeric_data

In [11]: df.select_dtypes(include=[np.number])
Out[11]:
       Val1      Val2      Val3      Val4
0 -1.540369 -0.077779  0.979606 -0.667112
1 -0.787154  0.048412  0.775444 -0.510904
2 -0.477234  0.414388  1.250544 -0.411658
3 -1.430851  0.258759  1.247752 -0.883293
4 -0.360181  0.485465  1.123589 -0.379157

Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):

尽管 apply 不提供就地,但您可以执行以下操作(无论如何我认为这更明确):

num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)