Python 将熊猫数据帧转换为 numpy 数组 - 更喜欢哪种方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49180018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas dataframe to numpy array - which approach to prefer?
提问by 00__00__00
I need to convert a large dataframe to a numpy array. Preserving only numerical values and types. I know there are well documented ways to do so.
我需要将大型数据帧转换为 numpy 数组。仅保留数值和类型。我知道有很好的记录方法可以做到这一点。
So, which one is to prefer?
那么,更喜欢哪一个呢?
df.values
df._as_matrix()
pd.to_numeric(df)
... others ...
Decision factor:
决定因素:
efficiency
safely operating on nan,np.nans, and other possible unexpected values
numerically stable
效率
在 nan、np.nans 和其他可能的意外值上安全运行
数值稳定
回答by jpp
The functions you mention serve different purposes.
您提到的功能用于不同的目的。
pd.to_numeric
: Use this to convert types in your dataframe if your data is not currently stored in numeric form orif you wish to cast as an optimal type viadowncast='float'
ordowncast='integer'
.pd.DataFrame.to_numpy()
(v0.24+) orpd.DataFrame.values
: Use this to retrievenumpy
array representation of your dataframe.pd.DataFrame.as_matrix
: Do not use this. It is included only for backwards compatibility.
pd.to_numeric
:如果您的数据当前未以数字形式存储,或者您希望通过downcast='float'
或 转换为最佳类型,请使用它来转换数据框中的类型downcast='integer'
。pd.DataFrame.to_numpy()
(v0.24+) 或pd.DataFrame.values
:使用它来检索numpy
数据帧的数组表示。pd.DataFrame.as_matrix
: 不要用这个。包含它只是为了向后兼容。
回答by ascripter
Under the hood, a pandas.DataFrame
is not much more than a numpy.array
. The simplest and possibly fastest way is to use pandas.DataFrame.values
在幕后, apandas.DataFrame
只不过是 a numpy.array
。最简单也可能是最快的方法是使用pandas.DataFrame.values
DataFrame.values
Numpy representation of NDFrame
Notes
The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type convention, mixing int64 and uint64 will result in a flot64 dtype.
DataFrame.values
NDFrame 的 Numpy 表示
笔记
dtype 将是一个较低的公分母 dtype(隐式向上转换);也就是说,如果 dtypes(甚至是数字类型)混合在一起,则将选择容纳所有类型的 dtypes。如果您不处理块,请小心使用它。
例如,如果 dtype 是 float16 和 float32,则 dtype 将向上转换为 float32。如果 dtypes 是 int32 和 uint8,则 dtype 将向上转换为 int32。根据 numpy.find_common_type 约定,混合 int64 和 uint64 将导致 flot64 dtype。