Python 将熊猫数据帧转换为 numpy 数组 - 更喜欢哪种方法？

Question

提问by 000000

I need to convert a large dataframe to a numpy array. Preserving only numerical values and types. I know there are well documented ways to do so.

我需要将大型数据帧转换为 numpy 数组。仅保留数值和类型。我知道有很好的记录方法可以做到这一点。

So, which one is to prefer?

那么，更喜欢哪一个呢？

df.values
df._as_matrix()
pd.to_numeric(df)
... others ...

Decision factor:

决定因素：

efficiency
safely operating on nan,np.nans, and other possible unexpected values
numerically stable

效率
在 nan、np.nans 和其他可能的意外值上安全运行
数值稳定

Answer 1

回答by jpp

The functions you mention serve different purposes.

您提到的功能用于不同的目的。

pd.to_numeric: Use this to convert types in your dataframe if your data is not currently stored in numeric form orif you wish to cast as an optimal type via downcast='float'or downcast='integer'.
pd.DataFrame.to_numpy()(v0.24+) or pd.DataFrame.values: Use this to retrieve numpyarray representation of your dataframe.
pd.DataFrame.as_matrix: Do not use this. It is included only for backwards compatibility.

pd.to_numeric：如果您的数据当前未以数字形式存储，或者您希望通过downcast='float'或转换为最佳类型，请使用它来转换数据框中的类型downcast='integer'。
pd.DataFrame.to_numpy()(v0.24+) 或pd.DataFrame.values：使用它来检索numpy数据帧的数组表示。
pd.DataFrame.as_matrix: 不要用这个。包含它只是为了向后兼容。

Answer 2

回答by ascripter

Under the hood, a pandas.DataFrameis not much more than a numpy.array. The simplest and possibly fastest way is to use pandas.DataFrame.values

在幕后， apandas.DataFrame只不过是 a numpy.array。最简单也可能是最快的方法是使用pandas.DataFrame.values

DataFrame.values
Numpy representation of NDFrame
Notes
The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type convention, mixing int64 and uint64 will result in a flot64 dtype.

DataFrame.values
NDFrame 的 Numpy 表示
笔记
dtype 将是一个较低的公分母 dtype（隐式向上转换）；也就是说，如果 dtypes（甚至是数字类型）混合在一起，则将选择容纳所有类型的 dtypes。如果您不处理块，请小心使用它。
例如，如果 dtype 是 float16 和 float32，则 dtype 将向上转换为 float32。如果 dtypes 是 int32 和 uint8，则 dtype 将向上转换为 int32。根据 numpy.find_common_type 约定，混合 int64 和 uint64 将导致 flot64 dtype。

Python 将熊猫数据帧转换为 numpy 数组 - 更喜欢哪种方法？

提问by 000000

回答by jpp

回答by ascripter

相关推荐

最近更新

标签

Python 将熊猫数据帧转换为 numpy 数组 - 更喜欢哪种方法？

提问by 00__00__00

回答by jpp

回答by ascripter

相关推荐

Python 的 VS Code 缩进

Python Pandas，对于一列中的每个唯一值，在另一列中获取唯一值

在只读视图（如 HTML 文件）中打开 IPython 笔记本 (*.ipynb)

Python 分类指标无法处理连续多输出和多标签指标目标的混合

相关推荐

最近更新

标签

提问by 000000