将 python xgboost dMatrix 转换为 numpy ndarray 或 pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37309096/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
convert python xgboost dMatrix to numpy ndarray or pandas DataFrame
提问by howard
I'm following a xgboost example on their main git at - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64
我正在关注他们主要 git 上的 xgboost 示例 - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64
in this example they are reading files directly put into dMatrix
-
在这个例子中,他们正在读取直接放入的文件dMatrix
-
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
I looked at dMatrix
code, seems there is no way to briefly look at how the data is structured - as we normally do in pandas with pandas.DataFrame.head()
我查看了dMatrix
代码,似乎没有办法简要查看数据的结构 - 正如我们通常在 Pandas 中所做的那样pandas.DataFrame.head()
in xgboost documentation it mentions that we can convert numpy.ndarray
to xgboost.dMatrix
- can we somehow convert it back - from xgboost.dMatrix
to numpy.ndarray
, or perhaps pandas dataFrame? I don't see possible way from their code - but perhaps someone knows a way?
在 xgboost 文档中,它提到我们可以转换numpy.ndarray
为xgboost.dMatrix
-我们可以以某种方式将其转换回 - from xgboost.dMatrix
to numpy.ndarray
,或者可能是pandas dataFrame?我从他们的代码中看不到可能的方法 - 但也许有人知道方法?
Or is there a way to briefly look at how data is like in xgboost.dMatrix
?
或者有没有一种方法可以简要地查看数据的情况xgboost.dMatrix
?
Thanks in advance, Howard
提前致谢,霍华德
回答by Peter
To elaborate on @jcaine's answer, you can use sklearn to load the files, then convert them to ordinary numpy arrays:
要详细说明@jcaine 的答案,您可以使用 sklearn 加载文件,然后将它们转换为普通的 numpy 数组:
from sklearn.datasets import load_svmlight_file
train_data = load_svmlight_file('demo/data/agaricus.txt.train')
X = train_data[0].toarray()
y = train_data[1]
I haven't found a way to directly convert from dMatrix to numpy arrays yet.
我还没有找到直接从 dMatrix 转换为 numpy 数组的方法。
回答by jcaine
Howard,
霍华德,
I believe that the xgb.DMatrix assumes the libsvm data format. You can get this data into a sparse CSR matrix using scikit's load_svmlight_file: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html.
我相信 xgb.DMatrix 采用 libsvm 数据格式。您可以使用 scikit 的 load_svmlight_file 将此数据放入稀疏 CSR 矩阵中:http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html 。
You can then partition the response variable and the features using the example at the bottom of the page.
然后,您可以使用页面底部的示例对响应变量和特征进行分区。