Python PyTorch:如何将 DataLoaders 用于自定义数据集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41924453/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PyTorch: How to use DataLoaders for custom Datasets
提问by Sarthak
How to make use of the torch.utils.data.Dataset
and torch.utils.data.DataLoader
on your own data (not just the torchvision.datasets
)?
如何使用torch.utils.data.Dataset
和torch.utils.data.DataLoader
使用您自己的数据(不仅仅是torchvision.datasets
)?
Is there a way to use the inbuilt DataLoaders
which they use on TorchVisionDatasets
to be used on any dataset?
有没有办法使用DataLoaders
他们TorchVisionDatasets
在任何数据集上使用的内置?
回答by pho7
Yes, that is possible. Just create the objects by yourself, e.g.
是的,这是可能的。只需自己创建对象,例如
import torch.utils.data as data_utils
train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)
where features
and targets
are tensors. features
has to be 2-D, i.e. a matrix where each line represents one training sample, and targets
may be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.
其中features
和targets
是张量。features
必须是二维的,即每条线代表一个训练样本的矩阵,targets
可能是一维或二维的,具体取决于您是尝试预测标量还是向量。
Hope that helps!
希望有帮助!
EDIT: response to @sarthak's question
编辑:对@sarthak 问题的回应
Basically yes. If you create an object of type TensorData
, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor
) and the target tensor (called target_tensor
) have the same length:
基本上是的。如果创建类型为 的对象TensorData
,则构造函数会调查特征张量(实际上称为data_tensor
)和目标张量(称为target_tensor
)的第一个维度是否具有相同的长度:
assert data_tensor.size(0) == target_tensor.size(0)
However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor
) into a matrix by using the method view
. For your 5000xnxnx3 dataset, this would look like this:
但是,如果您想随后将这些数据输入到神经网络中,则需要小心。虽然卷积层处理像你这样的数据,(我认为)所有其他类型的层都希望数据以矩阵形式给出。因此,如果您遇到这样的问题,那么一个简单的解决方案是使用 方法将您的 4D 数据集(以某种张量形式给出,例如FloatTensor
)转换为矩阵view
。对于您的 5000xnxnx3 数据集,这将如下所示:
2d_dataset = 4d_dataset.view(5000, -1)
(The value -1
tells PyTorch to figure out the length of the second dimension automatically.)
(该值-1
告诉 PyTorch 自动计算第二维的长度。)
回答by user3693922
You can easily do this be extending the data.Dataset
class.
According to the API, all you have to do is implement two function: __getitem__
and __len__
.
您可以通过扩展data.Dataset
类轻松做到这一点。根据API,您所要做的就是实现两个功能:__getitem__
和__len__
。
You can then wrap the dataset with the DataLoader as shown in the API and in @pho7 's answer.
然后,您可以使用 DataLoader 包装数据集,如 API 和@pho7 的答案中所示。
I think the ImageFolder
class is a reference. See code here.
我认为这ImageFolder
门课是一个参考。请参阅此处的代码。
回答by Blupon
In addition to user3693922's answerand the accepted answer, which respectively linkthe "quick" PyTorch documentation example to create custom dataloaders for custom datasets, and create a custom dataloader in the "simplest" case, there is a much more detailed dedicated official PyTorch tutorialon how to create a custom dataloader with the associated preprocessing: "writing custom datasets, dataloaders and transforms" official PyTorch tutorial
除了user3693922的回答和接受的回答,分别链接“快速”PyTorch文档示例为自定义数据集创建自定义数据加载器,以及在“最简单”的情况下创建自定义数据加载器,还有更详细的PyTorch官方专用教程关于如何使用相关预处理创建自定义数据加载器:“编写自定义数据集、数据加载器和转换”官方 PyTorch 教程