Python PyTorch 中的数据增强

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51677788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:54:08  来源:igfitidea点击:

Data Augmentation in PyTorch

pythonimage-processingdatasetpytorchdata-augmentation

提问by H.S

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping...etc). But that doesn't seem like happening in PyTorch. As far as I understood from the references, when we use data.transformsin PyTorch, then it applies them one by one. So for example:

我对 PyTorch 中执行的数据增强有点困惑。现在,据我所知,当我们执行数据增强时,我们会保留原始数据集,然后添加它的其他版本(翻转、裁剪等)。但这似乎不会在 PyTorch 中发生。据我了解,当我们data.transforms在 PyTorch 中使用时,它会一一应用它们。例如:

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Here , for the training, we are first randomly cropping the image and resizing it to shape (224,224). Then we are taking these (224,224)images and horizontally flipping them. Therefore, our dataset is now containing ONLY the horizontally flipped images, so our original images are lost in this case.

在这里,对于训练,我们首先随机裁剪图像并将其调整为 shape (224,224)。然后我们拍摄这些(224,224)图像并水平翻转它们。因此,我们的数据集现在只包含水平翻转的图像,因此在这种情况下我们的原始图像丢失了。

Am I right? Is this understanding correct? If not, then where do we tell PyTorch in this code above (taken from Official Documentation) to keep the original images and resize them to the expected shape (224,224)?

我对吗?这种理解是否正确?如果没有,那么我们在哪里告诉 PyTorch 在上面的代码(取自官方文档)中保留原始图像并将它们调整为预期的形状(224,224)

Thanks

谢谢

采纳答案by benjaminplanche

The transformsoperations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.

这些transforms操作会在每个批次生成时应用于您的原始图像。所以你的数据集保持不变,每次迭代只复制和转换批量图像。

The confusion may come from the fact that often, like in your example, transformsare used both for data preparation (resizing/cropping to expected dimensions, normalizing values, etc.) and for data augmentation (randomizing the resizing/cropping, randomly flipping the images, etc.).

混淆可能来自这样一个事实,就像在您的示例中一样,transforms通常用于数据准备(调整大小/裁剪到预期尺寸、标准化值等)和数据增强(随机调整大小/裁剪、随机翻转图像, 等等。)。



What your data_transforms['train']does is:

你的data_transforms['train']作用是:

  • Randomly resize the provided image and randomly crop it to obtain a (224, 224)patch
  • Apply or not a random horizontal flip to this patch, with a 50/50 chance
  • Convert it to a Tensor
  • Normalize the resulting Tensor, given the mean and deviation values you provided
  • 随机调整提供的图像大小并随机裁剪以获得(224, 224)补丁
  • 对该补丁应用或不应用随机水平翻转,有 50/50 的几率
  • 将其转换为 Tensor
  • Tensor给定您提供的均值和偏差值,对结果进行标准化

What your data_transforms['val']does is:

你的data_transforms['val']作用是:

  • Resize your image to (256, 256)
  • Center crop the resized image to obtain a (224, 224)patch
  • Convert it to a Tensor
  • Normalize the resulting Tensor, given the mean and deviation values you provided
  • 将图像调整为 (256, 256)
  • 居中裁剪调整大小的图像以获得(224, 224)补丁
  • 将其转换为 Tensor
  • Tensor给定您提供的均值和偏差值,对结果进行标准化

(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)

(即,将训练数据的随机调整大小/裁剪替换为验证数据的固定操作,以获得可靠的验证结果)



If you don't want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip()line.

如果您不希望您的训练图像以 50/50 的机会水平翻转,只需删除该transforms.RandomHorizontalFlip()线即可。

Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCropby transforms.Resizeand transforms.CenterCrop, as done for data_transforms['val'].

同样,如果您希望图像始终居中裁剪,请替换transforms.RandomResizedCroptransforms.Resizetransforms.CenterCrop,就像对 所做的那样data_transforms['val']

回答by Ashkan372

I assume you are asking whether these data augmentation transforms (e.g. RandomHorizontalFlip) actually increase the size of the datasetas well, or are they applied on each item in the dataset one by one and not adding to the size of the dataset.

我假设您在问这些数据增强变换(例如 RandomHorizo​​ntalFlip)是否实际上也增加了数据集的大小,或者它们是否逐一应用于数据集中的每个项目而不增加数据集的大小

Running the following simple code snippet we could observe that the latter is true, i.e. if you have a dataset of 8 images, and create a PyTorch dataset object for this dataset when you iterate through the dataset, the transformations are called on each data point, and the transformed data point is returned. So for example if you have random flipping, some of the data points are returned as original, some are returned as flipped (e.g. 4 flipped and 4 original). In other words, by one iteration through the dataset items, you get 8 data points(some flipped and some not). [Which is at odds with the conventional understanding of augmenting the dataset(e.g. in this case having 16 data points in the augmented dataset)]

运行以下简单的代码片段,我们可以观察到后者为真,即如果您有一个包含 8 个图像的数据集,并在迭代数据集时为该数据集创建一个 PyTorch 数据集对象,则会在每个数据点上调用转换,并返回转换后的数据点。因此,例如,如果您有随机翻转,一些数据点作为原始返回,一些作为翻转返回(例如 4 个翻转和 4 个原始)。换句话说,通过对数据集项进行一次迭代,您将获得 8 个数据点(有些翻转,有些未翻转)。[这与增强数据集的传统理解不一致(例如,在这种情况下,增强数据集中有 16 个数据点)]

class experimental_dataset(Dataset):

    def __init__(self, data, transform):
        self.data = data
        self.transform = transform

    def __len__(self):
        return len(self.data.shape[0])

    def __getitem__(self, idx):
        item = self.data[idx]
        item = self.transform(item)
        return item

    transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor()
    ])

x = torch.rand(8, 1, 2, 2)
print(x)

dataset = experimental_dataset(x,transform)

for item in dataset:
    print(item)

Results: (The little differences in floating points are caused by transforming to pil image and back)

结果:(浮点的微小差异是由转换为pil图像和返回引起的)

Original dummy dataset:

原始虚拟数据集:

tensor([[[[0.1872, 0.5518],
          [0.5733, 0.6593]]],


    [[[0.6570, 0.6487],
      [0.4415, 0.5883]]],


    [[[0.5682, 0.3294],
      [0.9346, 0.1243]]],


    [[[0.1829, 0.5607],
      [0.3661, 0.6277]]],


    [[[0.1201, 0.1574],
      [0.4224, 0.6146]]],


    [[[0.9301, 0.3369],
      [0.9210, 0.9616]]],


    [[[0.8567, 0.2297],
      [0.1789, 0.8954]]],


    [[[0.0068, 0.8932],
      [0.9971, 0.3548]]]])

transformed dataset:

转换后的数据集:

tensor([[[0.1843, 0.5490],
     [0.5725, 0.6588]]])
tensor([[[0.6549, 0.6471],
     [0.4392, 0.5882]]])
tensor([[[0.5647, 0.3255],
         [0.9333, 0.1216]]])
tensor([[[0.5569, 0.1804],
         [0.6275, 0.3647]]])
tensor([[[0.1569, 0.1176],
         [0.6118, 0.4196]]])
tensor([[[0.9294, 0.3333],
         [0.9176, 0.9608]]])
tensor([[[0.8549, 0.2275],
         [0.1765, 0.8941]]])
tensor([[[0.8902, 0.0039],
         [0.3529, 0.9961]]])