pandas 将所有数据框列转换为浮动的最快方法 - 熊猫 astype 慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42628577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:07:27  来源:igfitidea点击:

Fastest way to cast all dataframe columns to float - pandas astype slow

pythonperformancepandasnumpydataframe

提问by elleciel

Is there a faster way to cast all columns of a pandas dataframe to a single type? This seems particularly slow:

有没有更快的方法将Pandas数据帧的所有列转换为单一类型?这似乎特别慢:

df = df.apply(lambda x: x.astype(np.float64), axis=1)

I suspect there's not much I can do about it because of the memory allocation overhead of numpy.ndarray.astype.

我怀疑由于numpy.ndarray.astype.

I've also tried pd.to_numericbut it arbitrarily chooses to cast a few of my columns into inttypes instead.

我也尝试过,pd.to_numeric但它任意选择将我的一些列转换为int类型。

回答by miradulo

No need for apply, just use DataFrame.astypedirectly.

不需要apply,直接使用DataFrame.astype即可。

df.astype(np.float64)

apply-ing is also going to give you a pretty bad performance hit.

apply-ing 也会给你带来非常糟糕的性能损失。

Example

例子

df = pd.DataFrame(np.arange(10**7).reshape(10**4, 10**3))

%timeit df.astype(np.float64)
1 loop, best of 3: 288 ms per loop

%timeit df.apply(lambda x: x.astype(np.float64), axis=0)
1 loop, best of 3: 748 ms per loop

%timeit df.apply(lambda x: x.astype(np.float64), axis=1)
1 loop, best of 3: 2.95 s per loop

回答by Divakar

One efficient way would be to work with array data and cast it back to a dataframe, like so -

一种有效的方法是使用数组数据并将其转换回数据帧,如下所示 -

pd.DataFrame(df.values.astype(np.float64))

Runtime test -

运行时测试 -

In [144]: df = pd.DataFrame(np.random.randint(11,99,(5000,5000)))

In [145]: %timeit df.astype(np.float64) # @Mitch's soln
10 loops, best of 3: 121 ms per loop

In [146]: %timeit pd.DataFrame(df.values.astype(np.float64))
10 loops, best of 3: 42.5 ms per loop

The casting back to dataframe wasn't that costly -

转换回数据帧并没有那么昂贵 -

In [147]: %timeit df.values.astype(np.float64)
10 loops, best of 3: 42.3 ms per loop # Casting to dataframe costed 0.2ms