Python 将熊猫数据帧转换为 numpy 数组 - 更喜欢哪种方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49180018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:00:20  来源:igfitidea点击:

Convert pandas dataframe to numpy array - which approach to prefer?

pythonpandasnumpy

提问by 00__00__00

I need to convert a large dataframe to a numpy array. Preserving only numerical values and types. I know there are well documented ways to do so.

我需要将大型数据帧转换为 numpy 数组。仅保留数值和类型。我知道有很好的记录方法可以做到这一点。

So, which one is to prefer?

那么,更喜欢哪一个呢?

df.values
df._as_matrix()
pd.to_numeric(df)
... others ...

Decision factor:

决定因素:

  • efficiency

  • safely operating on nan,np.nans, and other possible unexpected values

  • numerically stable

  • 效率

  • 在 nan、np.nans 和其他可能的意外值上安全运行

  • 数值稳定

回答by jpp

The functions you mention serve different purposes.

您提到的功能用于不同的目的。

  1. pd.to_numeric: Use this to convert types in your dataframe if your data is not currently stored in numeric form orif you wish to cast as an optimal type via downcast='float'or downcast='integer'.

  2. pd.DataFrame.to_numpy()(v0.24+) or pd.DataFrame.values: Use this to retrieve numpyarray representation of your dataframe.

  3. pd.DataFrame.as_matrix: Do not use this. It is included only for backwards compatibility.

  1. pd.to_numeric:如果您的数据当前未以数字形式存储,或者您希望通过downcast='float'或 转换为最佳类型,请使用它来转换数据框中的类型downcast='integer'

  2. pd.DataFrame.to_numpy()(v0.24+) 或pd.DataFrame.values:使用它来检索numpy数据帧的数组表示。

  3. pd.DataFrame.as_matrix: 不要用这个。包含它只是为了向后兼容。

回答by ascripter

Under the hood, a pandas.DataFrameis not much more than a numpy.array. The simplest and possibly fastest way is to use pandas.DataFrame.values

在幕后, apandas.DataFrame只不过是 a numpy.array。最简单也可能是最快的方法是使用pandas.DataFrame.values

DataFrame.values

Numpy representation of NDFrame

Notes

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type convention, mixing int64 and uint64 will result in a flot64 dtype.

DataFrame.values

NDFrame 的 Numpy 表示

笔记

dtype 将是一个较低的公分母 dtype(隐式向上转换);也就是说,如果 dtypes(甚至是数字类型)混合在一起,则将选择容纳所有类型的 dtypes。如果您不处理块,请小心使用它。

例如,如果 dtype 是 float16 和 float32,则 dtype 将向上转换为 float32。如果 dtypes 是 int32 和 uint8,则 dtype 将向上转换为 int32。根据 numpy.find_common_type 约定,混合 int64 和 uint64 将导致 flot64 dtype。