将 python xgboost dMatrix 转换为 numpy ndarray 或 pandas DataFrame

Question

提问by howard

I'm following a xgboost example on their main git at - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64

我正在关注他们主要 git 上的 xgboost 示例 - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64

in this example they are reading files directly put into dMatrix-

在这个例子中，他们正在读取直接放入的文件dMatrix-

dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')

I looked at dMatrixcode, seems there is no way to briefly look at how the data is structured - as we normally do in pandas with pandas.DataFrame.head()

我查看了dMatrix代码，似乎没有办法简要查看数据的结构 - 正如我们通常在 Pandas 中所做的那样pandas.DataFrame.head()

in xgboost documentation it mentions that we can convert numpy.ndarrayto xgboost.dMatrix- can we somehow convert it back - from xgboost.dMatrixto numpy.ndarray, or perhaps pandas dataFrame? I don't see possible way from their code - but perhaps someone knows a way?

在 xgboost 文档中，它提到我们可以转换numpy.ndarray为xgboost.dMatrix-我们可以以某种方式将其转换回 - from xgboost.dMatrixto numpy.ndarray，或者可能是pandas dataFrame？我从他们的代码中看不到可能的方法 - 但也许有人知道方法？

Or is there a way to briefly look at how data is like in xgboost.dMatrix?

或者有没有一种方法可以简要地查看数据的情况xgboost.dMatrix？

Thanks in advance, Howard

提前致谢，霍华德

Answer 1

回答by Peter

To elaborate on @jcaine's answer, you can use sklearn to load the files, then convert them to ordinary numpy arrays:

要详细说明@jcaine 的答案，您可以使用 sklearn 加载文件，然后将它们转换为普通的 numpy 数组：

from sklearn.datasets import load_svmlight_file
train_data = load_svmlight_file('demo/data/agaricus.txt.train')
X = train_data[0].toarray()
y = train_data[1]

I haven't found a way to directly convert from dMatrix to numpy arrays yet.

我还没有找到直接从 dMatrix 转换为 numpy 数组的方法。

Answer 2

回答by jcaine

Howard,

霍华德，

I believe that the xgb.DMatrix assumes the libsvm data format. You can get this data into a sparse CSR matrix using scikit's load_svmlight_file: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html.

我相信 xgb.DMatrix 采用 libsvm 数据格式。您可以使用 scikit 的 load_svmlight_file 将此数据放入稀疏 CSR 矩阵中：http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html 。

You can then partition the response variable and the features using the example at the bottom of the page.

然后，您可以使用页面底部的示例对响应变量和特征进行分区。

将 python xgboost dMatrix 转换为 numpy ndarray 或 pandas DataFrame

提问by howard

回答by Peter

回答by jcaine

相关推荐

最近更新

标签

将 python xgboost dMatrix 转换为 numpy ndarray 或 pandas DataFrame

提问by howard

回答by Peter

回答by jcaine

相关推荐

pandas 如何在 Seaborn 点图上获取数据标签？

pandas 更改 seaborn boxplot 中的 X 轴标签

访问包含列表的 Pandas DataFrame 列的每个第一个元素

pandas 将线添加到熊猫图

相关推荐

最近更新

标签