Python PyTorch：如何将 DataLoaders 用于自定义数据集

Question

提问by Sarthak

How to make use of the torch.utils.data.Datasetand torch.utils.data.DataLoaderon your own data (not just the torchvision.datasets)?

如何使用torch.utils.data.Dataset和torch.utils.data.DataLoader使用您自己的数据（不仅仅是torchvision.datasets）？

Is there a way to use the inbuilt DataLoaderswhich they use on TorchVisionDatasetsto be used on any dataset?

有没有办法使用DataLoaders他们TorchVisionDatasets在任何数据集上使用的内置？

Answer 1

回答by pho7

Yes, that is possible. Just create the objects by yourself, e.g.

是的，这是可能的。只需自己创建对象，例如

import torch.utils.data as data_utils

train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

where featuresand targetsare tensors. featureshas to be 2-D, i.e. a matrix where each line represents one training sample, and targetsmay be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.

其中features和targets是张量。features必须是二维的，即每条线代表一个训练样本的矩阵，targets可能是一维或二维的，具体取决于您是尝试预测标量还是向量。

Hope that helps!

希望有帮助！

EDIT: response to @sarthak's question

编辑：对@sarthak 问题的回应

Basically yes. If you create an object of type TensorData, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor) and the target tensor (called target_tensor) have the same length:

基本上是的。如果创建类型为的对象TensorData，则构造函数会调查特征张量（实际上称为data_tensor）和目标张量（称为target_tensor）的第一个维度是否具有相同的长度：

assert data_tensor.size(0) == target_tensor.size(0)

However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor) into a matrix by using the method view. For your 5000xnxnx3 dataset, this would look like this:

但是，如果您想随后将这些数据输入到神经网络中，则需要小心。虽然卷积层处理像你这样的数据，（我认为）所有其他类型的层都希望数据以矩阵形式给出。因此，如果您遇到这样的问题，那么一个简单的解决方案是使用方法将您的 4D 数据集（以某种张量形式给出，例如FloatTensor）转换为矩阵view。对于您的 5000xnxnx3 数据集，这将如下所示：

2d_dataset = 4d_dataset.view(5000, -1)

(The value -1tells PyTorch to figure out the length of the second dimension automatically.)

（该值-1告诉 PyTorch 自动计算第二维的长度。）

Answer 2

回答by user3693922

You can easily do this be extending the data.Datasetclass. According to the API, all you have to do is implement two function: __getitem__and __len__.

您可以通过扩展data.Dataset类轻松做到这一点。根据API，您所要做的就是实现两个功能：__getitem__和__len__。

You can then wrap the dataset with the DataLoader as shown in the API and in @pho7 's answer.

然后，您可以使用 DataLoader 包装数据集，如 API 和@pho7 的答案中所示。

I think the ImageFolderclass is a reference. See code here.

我认为这ImageFolder门课是一个参考。请参阅此处的代码。

Answer 3

回答by Blupon

In addition to user3693922's answerand the accepted answer, which respectively linkthe "quick" PyTorch documentation example to create custom dataloaders for custom datasets, and create a custom dataloader in the "simplest" case, there is a much more detailed dedicated official PyTorch tutorialon how to create a custom dataloader with the associated preprocessing: "writing custom datasets, dataloaders and transforms" official PyTorch tutorial

除了user3693922的回答和接受的回答，分别链接“快速”PyTorch文档示例为自定义数据集创建自定义数据加载器，以及在“最简单”的情况下创建自定义数据加载器，还有更详细的PyTorch官方专用教程关于如何使用相关预处理创建自定义数据加载器：“编写自定义数据集、数据加载器和转换”官方 PyTorch 教程

Python PyTorch：如何将 DataLoaders 用于自定义数据集

提问by Sarthak

回答by pho7

回答by user3693922

回答by Blupon

相关推荐

最近更新

标签

Python PyTorch：如何将 DataLoaders 用于自定义数据集

提问by Sarthak

回答by pho7

回答by user3693922

回答by Blupon

相关推荐

Python Pandas - 缺少必需的依赖项 ['numpy'] 1

Python 用于神经网络的 Keras 模型 load_weights

在 Python Pandas 中将列转换为行

Python ValueError: 时间数据与格式“%Y-%m-%d %H:%M:%S.%f”不匹配

相关推荐

最近更新

标签