pandas 在熊猫数据框中舍入一列

Question

提问by Wessi

I have a pandas dataframe dfthat looks like this:

我有一个如下所示的 Pandas 数据框df：

          no_obs  price_cleaning  house_size
0         1             585          30
1         1             585          40
2         1             585          43
3         1             650          43
4         1             633          44
5         1             650          45
6         2             585          50
7         1             633          50
8         1             650          50
9         2             750          50

I want to round up the values in the price_cleaningcolumn with this function:

我想price_cleaning用这个函数四舍五入列中的值：

def roundup(x): return int(math.ceil(x / 10.0)) * 10

I have tried the solution from this answer (Applying function to Pandas dataframe by column):

我已经尝试过这个答案的解决方案（按列将函数应用于 Pandas 数据框）：

cols = [col for col in df.columns if col != 'price_cleaning'] df[cols] = df[cols].apply(roundup)

I get the following error: TypeError: ("cannot convert the series to ", 'occurred at index no_obs')

我收到以下错误：TypeError: ("cannot convert the series to ", 'occurred at index no_obs')

Can anyone help me understand why this is not working? How do I apply the roundup function to the column? Any help is much appreciated.

谁能帮我理解为什么这不起作用？如何将舍入函数应用于列？任何帮助深表感谢。

Answer 1

回答by Zero

I'd vectorize like

我会矢量化

In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)

In [299]: df
Out[299]:
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

For 10K rows, timings - vectorized method is ~15x times faster to apply

对于 10K 行，时序 - 矢量化方法比 apply

In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 μs per loop

In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop

In [333]: dff.shape
Out[333]: (10000, 4)

Atleast in this case, performance gap, with more rows, will increase.

至少在这种情况下，性能差距会随着行数的增加而增加。

Answer 2

回答by Boud

You are filtering the columns upside down, do this instead:

您正在倒置过滤列，请改为执行以下操作：

cols = [col for col in  df.columns if col == 'price_cleaning']

Now, if you need to cleanup only one columns, then no need to create cols. Just do:

现在，如果您只需要清理一列，则无需创建cols. 做就是了：

df['price_cleaning'] = df['price_cleaning'].apply(roundup)

Answer 3

回答by erasmortg

This might work:

这可能有效：

>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)

Answer 4

回答by Fabio Lamanna

I think you can use applyand lambdaas:

我认为你可以使用apply和lambda作为：

In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)

In [7]: df
Out[7]: 
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

pandas 在熊猫数据框中舍入一列

提问by Wessi

回答by Zero

回答by Boud

回答by erasmortg

回答by Fabio Lamanna

相关推荐

最近更新

标签

pandas 在熊猫数据框中舍入一列

提问by Wessi

回答by Zero

回答by Boud

回答by erasmortg

回答by Fabio Lamanna

相关推荐

Pandas 获取加载到内存中的所有数据帧的列表

pandas 将 <m8[ns] 转换为 int

pandas 根据行索引创建名为“Id”的列

Pandas - 过滤所有列

相关推荐

最近更新

标签