将 Pandas 数据集转换为数组以在 Scikit-Learn 中建模

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22562540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:50:20  来源:igfitidea点击:

Pandas dataset into an array for modelling in Scikit-Learn

pythonpandasscikit-learn

提问by user40465

Can we run scikit-learn models on Pandas DataFrames or do we need to convert DataFrames into NumPy arrays?

我们可以在 Pandas DataFrames 上运行 scikit-learn 模型还是需要将 DataFrames 转换为 NumPy 数组?

回答by Akavall

You can use pandas.DataFramewith sklearn, for example:

您可以使用pandas.DataFramewith sklearn,例如:

import pandas as pd
from sklearn.cluster import KMeans

data = [(0.2, 10),
        (0.3, 12),
        (0.24, 14),
        (0.8, 30),
        (0.9, 32),
        (0.85, 33.3),
        (0.91, 31),
        (0.1, 15),
        (-0.23, 45)]

p_df = pd.DataFrame(data)
kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(p_df)

Result:

结果:

>>> kmeans.labels_
array([0, 0, 0, 2, 2, 2, 2, 0, 1], dtype=int32)

回答by Greg

Pandas DataFrames are very good at acting like Numpy arrays when they need to. If in doubt, you can always use the valuesattribute to get a Numpy representation (df.valueswill give you a Numpy array of the values in DataFrame df.

Pandas DataFrames 非常擅长在需要时像 Numpy 数组一样工作。如果有疑问,您始终可以使用该values属性来获取 Numpy 表示(df.values将为您提供 DataFrame 中值的 Numpy 数组df