将 Pandas 数据集转换为数组以在 Scikit-Learn 中建模
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22562540/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataset into an array for modelling in Scikit-Learn
提问by user40465
Can we run scikit-learn models on Pandas DataFrames or do we need to convert DataFrames into NumPy arrays?
我们可以在 Pandas DataFrames 上运行 scikit-learn 模型还是需要将 DataFrames 转换为 NumPy 数组?
回答by Akavall
You can use pandas.DataFramewith sklearn, for example:
您可以使用pandas.DataFramewith sklearn,例如:
import pandas as pd
from sklearn.cluster import KMeans
data = [(0.2, 10),
        (0.3, 12),
        (0.24, 14),
        (0.8, 30),
        (0.9, 32),
        (0.85, 33.3),
        (0.91, 31),
        (0.1, 15),
        (-0.23, 45)]
p_df = pd.DataFrame(data)
kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(p_df)
Result:
结果:
>>> kmeans.labels_
array([0, 0, 0, 2, 2, 2, 2, 0, 1], dtype=int32)
回答by Greg
Pandas DataFrames are very good at acting like Numpy arrays when they need to. If in doubt, you can always use the valuesattribute to get a Numpy representation (df.valueswill give you a Numpy array of the values in DataFrame df.
Pandas DataFrames 非常擅长在需要时像 Numpy 数组一样工作。如果有疑问,您始终可以使用该values属性来获取 Numpy 表示(df.values将为您提供 DataFrame 中值的 Numpy 数组df。

