Python 如何将 numpy 数组列表加载到 pytorch 数据集加载器？

Question

提问by deepayan das

I have a huge list of numpy arrays, where each array represents an image and I want to load it using torch.utils.data.Dataloader object. But the documentation of torch.utils.data.Dataloader mentions that it loads data directly from a folder. How do I modify it for my cause? I am new to pytorch and any help would be greatly appreciated. my numpy array for a single image looks something like this. The image is RBG image.

我有一个巨大的 numpy 数组列表，其中每个数组代表一个图像，我想使用 torch.utils.data.Dataloader 对象加载它。但是 torch.utils.data.Dataloader 的文档提到它直接从文件夹加载数据。我如何根据我的原因修改它？我是 pytorch 的新手，任何帮助将不胜感激。我的单个图像的 numpy 数组看起来像这样。图像是 RBG 图像。

`[[[ 70  82  94]
  [ 67  81  93]
  [ 66  82  94]
  ..., 
  [182 182 188]
  [183 183 189]
  [188 186 192]]

 [[ 66  80  92]
  [ 62  78  91]
  [ 64  79  95]
  ..., 
  [176 176 182]
  [178 178 184]
  [180 180 186]]

 [[ 62  82  93]
  [ 62  81  96]
  [ 65  80  99]
  ..., 
  [169 172 177]
  [173 173 179]
  [172 172 178]]

 ..., 
`

Answer 1

回答by mexmex

I think what DataLoader actually requires is an input that subclasses Dataset. You can either write your own dataset class that subclasses Datasetor use TensorDatasetas I have done below:

我认为 DataLoader 实际需要的是一个将Dataset. 您可以编写自己的数据集类作为子类，Dataset也可以TensorDataset像我在下面所做的那样使用：

import torch
import numpy as np
from torch.utils.data import TensorDataset, DataLoader

my_x = [np.array([[1.0,2],[3,4]]),np.array([[5.,6],[7,8]])] # a list of numpy arrays
my_y = [np.array([4.]), np.array([2.])] # another list of numpy arrays (targets)

tensor_x = torch.Tensor(my_x) # transform to torch tensor
tensor_y = torch.Tensor(my_y)

my_dataset = TensorDataset(tensor_x,tensor_y) # create your datset
my_dataloader = DataLoader(my_dataset) # create your dataloader

Works for me. Hope it helps you.

为我工作。希望对你有帮助。

Answer 2

回答by prosti

PyTorch DataLoaderneed a DataSetas you can check in the docs. The right way to do that is to use:

PyTorchDataLoader需要一个，DataSet因为您可以查看文档。正确的方法是使用：

torch.utils.data.TensorDataset(*tensors)

Which is a Dataset for wrapping tensors, where each sample will be retrieved by indexing tensors along the first dimension. The parameters *tensorsmeans tensors that have the same size of the first dimension.

这是一个用于包装张量的数据集，其中每个样本将通过沿第一维索引张量来检索。参数*tensors表示与第一维具有相同大小的张量。

The other class torch.utils.data.Datasetis an abstract class.

另一个class torch.utils.data.Dataset是抽象类。

Here is how to convert numpy arrays to tensors:

以下是将 numpy 数组转换为张量的方法：

import torch
import numpy as np
n = np.arange(10)
print(n) #[0 1 2 3 4 5 6 7 8 9]
t1 = torch.Tensor(n)  # as torch.float32
print(t1) #tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
t2 = torch.from_numpy(n)  # as torch.int32
print(t2) #tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)

The accepted answer used the torch.Tensorconstruct. If you have an image with pixels from 0-255 you may use this:

接受的答案使用了torch.Tensor构造。如果您有一个像素为 0-255 的图像，您可以使用它：

timg = torch.from_numpy(img).float()

Or torchvision to_tensormethod, that converts a PIL Image or numpy.ndarray to tensor.

或 torchvisionto_tensor方法，将 PIL Image 或 numpy.ndarray 转换为张量。

But here is a little trick you can put your numpy arrays directly.

但是这里有一个小技巧，您可以直接放置 numpy 数组。

x1 = np.array([1,2,3])
d1 = DataLoader( x1, batch_size=3)

This also works, but if you print d1.datasettype:

这也有效，但如果您打印d1.dataset类型：

print(type(d1.dataset)) # <class 'numpy.ndarray'>

While we actually need Tensors for working with CUDA so it is better to use Tensors to feed the DataLoader.

虽然我们实际上需要张量来与 CUDA 一起工作，所以最好使用张量来提供DataLoader.

Answer 3

回答by X ? A-12

Since you have images you probably want to perform transformations on them. So TensorDatasetis not the best option here. Instead you can create your own Dataset. Something like this:

由于您有图像，您可能希望对它们执行转换。所以TensorDataset不是这里的最佳选择。相反，您可以创建自己的Dataset. 像这样的东西：

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
from PIL import Image


class MyDataset(Dataset):
    def __init__(self, data, targets, transform=None):
        self.data = data
        self.targets = torch.LongTensor(targets)
        self.transform = transform

    def __getitem__(self, index):
        x = self.data[index]
        y = self.targets[index]

        if self.transform:
            x = Image.fromarray(self.data[index].astype(np.uint8).transpose(1,2,0))
            x = self.transform(x)

        return x, y

    def __len__(self):
        return len(self.data)

# Let's create 10 RGB images of size 128x128 and ten labels {0, 1}
data = list(np.random.randint(0, 255, size=(10, 3, 128, 128)))
targets = list(np.random.randint(2, size=(10)))

transform = transforms.Compose([transforms.Resize(64), transforms.ToTensor()])
dataset = MyDataset(data, targets, transform=transform)
dataloader = DataLoader(dataset, batch_size=5)

Answer 4

回答by Tej Chaugule

for the above , to create own dataset given by @Andreas K. , we get the name 'transforms' is not defined.

对于上述内容，要创建由@Andreas K. 给出的自己的数据集，我们得到的名称“transforms”未定义。

Python 如何将 numpy 数组列表加载到 pytorch 数据集加载器？

提问by deepayan das

回答by mexmex

回答by prosti

回答by X ? A-12

回答by Tej Chaugule

相关推荐

最近更新

标签

Python 如何将 numpy 数组列表加载到 pytorch 数据集加载器？

提问by deepayan das

回答by mexmex

回答by prosti

回答by X ? A-12

回答by Tej Chaugule

相关推荐

Python ssl.SSLError: tlsv1 警报协议版本

Python 如何仅从日期时间对象中提取月份和日期？

在 Python 中将 CSV 转换为 HTML 表格

如何通过python中的drawparallels将标签字体设置为“Time New Roman”

相关推荐

最近更新

标签