pandas 在熊猫数据框中舍入一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41303189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:40:07  来源:igfitidea点击:

Rounding up one column in pandas dataframe

pythonpython-3.xpandaspython-3.5

提问by Wessi

I have a pandas dataframe dfthat looks like this:

我有一个如下所示的 Pandas 数据框df

          no_obs  price_cleaning  house_size
0         1             585          30
1         1             585          40
2         1             585          43
3         1             650          43
4         1             633          44
5         1             650          45
6         2             585          50
7         1             633          50
8         1             650          50
9         2             750          50 

I want to round up the values in the price_cleaningcolumn with this function:

我想price_cleaning用这个函数四舍五入列中的值:

def roundup(x): return int(math.ceil(x / 10.0)) * 10

def roundup(x): return int(math.ceil(x / 10.0)) * 10

I have tried the solution from this answer (Applying function to Pandas dataframe by column):

我已经尝试过这个答案的解决方案(按列将函数应用于 Pandas 数据框):

cols = [col for col in df.columns if col != 'price_cleaning'] df[cols] = df[cols].apply(roundup)

cols = [col for col in df.columns if col != 'price_cleaning'] df[cols] = df[cols].apply(roundup)

I get the following error: TypeError: ("cannot convert the series to ", 'occurred at index no_obs')

我收到以下错误:TypeError: ("cannot convert the series to ", 'occurred at index no_obs')

Can anyone help me understand why this is not working? How do I apply the roundup function to the column? Any help is much appreciated.

谁能帮我理解为什么这不起作用?如何将舍入函数应用于列?任何帮助深表感谢。

回答by Zero

I'd vectorize like

我会矢量化

In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)

In [299]: df
Out[299]:
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

For 10K rows, timings - vectorized method is ~15x times faster to apply

对于 10K 行,时序 - 矢量化方法比 apply

In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 μs per loop

In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop

In [333]: dff.shape
Out[333]: (10000, 4)

Atleast in this case, performance gap, with more rows, will increase.

至少在这种情况下,性能差距会随着行数的增加而增加。

回答by Boud

You are filtering the columns upside down, do this instead:

您正在倒置过滤列,请改为执行以下操作:

cols = [col for col in  df.columns if col == 'price_cleaning']

Now, if you need to cleanup only one columns, then no need to create cols. Just do:

现在,如果您只需要清理一列,则无需创建cols. 做就是了:

df['price_cleaning'] = df['price_cleaning'].apply(roundup)

回答by erasmortg

This might work:

这可能有效:

>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)

回答by Fabio Lamanna

I think you can use applyand lambdaas:

我认为你可以使用applylambda作为:

In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)

In [7]: df
Out[7]: 
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750