pandas 在熊猫数据框中舍入一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41303189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Rounding up one column in pandas dataframe
提问by Wessi
I have a pandas dataframe df
that looks like this:
我有一个如下所示的 Pandas 数据框df
:
no_obs price_cleaning house_size
0 1 585 30
1 1 585 40
2 1 585 43
3 1 650 43
4 1 633 44
5 1 650 45
6 2 585 50
7 1 633 50
8 1 650 50
9 2 750 50
I want to round up the values in the price_cleaning
column with this function:
我想price_cleaning
用这个函数四舍五入列中的值:
def roundup(x):
return int(math.ceil(x / 10.0)) * 10
def roundup(x):
return int(math.ceil(x / 10.0)) * 10
I have tried the solution from this answer (Applying function to Pandas dataframe by column):
我已经尝试过这个答案的解决方案(按列将函数应用于 Pandas 数据框):
cols = [col for col in df.columns if col != 'price_cleaning']
df[cols] = df[cols].apply(roundup)
cols = [col for col in df.columns if col != 'price_cleaning']
df[cols] = df[cols].apply(roundup)
I get the following error: TypeError: ("cannot convert the series to ", 'occurred at index no_obs')
我收到以下错误:TypeError: ("cannot convert the series to ", 'occurred at index no_obs')
Can anyone help me understand why this is not working? How do I apply the roundup function to the column? Any help is much appreciated.
谁能帮我理解为什么这不起作用?如何将舍入函数应用于列?任何帮助深表感谢。
回答by Zero
I'd vectorize like
我会矢量化
In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)
In [299]: df
Out[299]:
no_obs price_cleaning house_size p
0 1 585 30 590
1 1 585 40 590
2 1 585 43 590
3 1 650 43 650
4 1 633 44 640
5 1 650 45 650
6 2 585 50 590
7 1 633 50 640
8 1 650 50 650
9 2 750 50 750
For 10K rows, timings - vectorized method is ~15x times faster to apply
对于 10K 行,时序 - 矢量化方法比 apply
In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 μs per loop
In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop
In [333]: dff.shape
Out[333]: (10000, 4)
Atleast in this case, performance gap, with more rows, will increase.
至少在这种情况下,性能差距会随着行数的增加而增加。
回答by Boud
You are filtering the columns upside down, do this instead:
您正在倒置过滤列,请改为执行以下操作:
cols = [col for col in df.columns if col == 'price_cleaning']
Now, if you need to cleanup only one columns, then no need to create cols
. Just do:
现在,如果您只需要清理一列,则无需创建cols
. 做就是了:
df['price_cleaning'] = df['price_cleaning'].apply(roundup)
回答by erasmortg
This might work:
这可能有效:
>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)
回答by Fabio Lamanna
I think you can use apply
and lambda
as:
我认为你可以使用apply
和lambda
作为:
In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)
In [7]: df
Out[7]:
no_obs price_cleaning house_size p
0 1 585 30 590
1 1 585 40 590
2 1 585 43 590
3 1 650 43 650
4 1 633 44 640
5 1 650 45 650
6 2 585 50 590
7 1 633 50 640
8 1 650 50 650
9 2 750 50 750