Python Tensorflow 分配内存:38535168 的分配超过系统内存的 10%

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50304156/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:26:55  来源:igfitidea点击:

Tensorflow Allocation Memory: Allocation of 38535168 exceeds 10% of system memory

pythontensorflowmemorykeras-layerresnet

提问by Madhi

Using ResNet50 pre-trained Weights I am trying to build a classifier. The code base is fully implemented in Keras high-level Tensorflow API. The complete code is posted in the below GitHub Link.

使用 ResNet50 预训练权重我正在尝试构建一个分类器。代码库完全在 Keras 高级 Tensorflow API 中实现。完整的代码发布在下面的 GitHub 链接中。

Source Code: Classification Using RestNet50 Architecture

源代码:使用 RestNet50 架构进行分类

The file size of the pre-trained model is 94.7mb.

预训练模型的文件大小为94.7mb

I loaded the pre-trained file

我加载了预先训练的文件

new_model = Sequential()

new_model.add(ResNet50(include_top=False,
                pooling='avg',
                weights=resnet_weight_paths))

and fit the model

并拟合模型

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 12,
    class_mode = 'categorical'
    )

validation_generator = data_generator.flow_from_directory(
    'path_to_the_validation_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    class_mode = 'categorical'
    )

#compile the model

new_model.fit_generator(
    train_generator,
    steps_per_epoch = 3,
    validation_data = validation_generator,
    validation_steps = 1
)

and in the Training dataset, I have two folders dog and cat, each holder almost 10,000 images. When I compiled the script, I get the following error

在训练数据集中,我有两个文件夹 dog 和 cat,每个文件夹包含近 10,000 张图像。当我编译脚本时,出现以下错误

Epoch 1/1 2018-05-12 13:04:45.847298: W tensorflow/core/framework/allocator.cc:101] Allocation of 38535168 exceeds 10% of system memory. 2018-05-12 13:04:46.845021: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:47.552176: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.199240: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.918930: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:49.274137: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:49.647061: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.028839: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.413735: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory.

Epoch 1/1 2018-05-12 13:04:45.847298: W tensorflow/core/framework/allocator.cc:101] 38535168 的分配超过了系统内存的 10%。2018-05-12 13:04:46.845021: W tensorflow/core/framework/allocator.cc:101] 37171200 的分配超过了系统内存的 10%。2018-05-12 13:04:47.552176: W tensorflow/core/framework/allocator.cc:101] 37171200 的分配超过了系统内存的 10%。2018-05-12 13:04:48.199240: W tensorflow/core/framework/allocator.cc:101] 37171200 的分配超过了系统内存的 10%。2018-05-12 13:04:48.918930: W tensorflow/core/framework/allocator.cc:101] 37171200 的分配超过了系统内存的 10%。2018-05-12 13:04:49.274137: W tensorflow/core/framework/allocator.cc:101] 19267584 的分配超过了系统内存的 10%。2018-05-12 13:04:49.647061:W tensorflow/core/framework/allocator.cc:101] 19267584 的分配超过了系统内存的 10%。2018-05-12 13:04:50.028839: W tensorflow/core/framework/allocator.cc:101] 19267584 的分配超过了系统内存的 10%。2018-05-12 13:04:50.413735: W tensorflow/core/framework/allocator.cc:101] 19267584 的分配超过了系统内存的 10%。

Any ideas to optimize the way to load the pre-trained model (or) get rid of this warning message?

有什么想法可以优化加载预训练模型的方式(或)摆脱此警告消息?

Thanks!

谢谢!

采纳答案by jaba_y

Try reducing batch_size attribute to a small number(like 1,2 or 3). Example:

尝试将 batch_size 属性减少到一个小的数字(如 1,2 或 3)。例子:

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 2,
    class_mode = 'categorical'
    )

回答by revolutionary

I was having the same problem while running Tensorflow container with Docker and Jupyter notebook. I was able to fix this problem by increasing the container memory.

我在使用 Docker 和 Jupyter notebook 运行 Tensorflow 容器时遇到了同样的问题。我能够通过增加容器内存来解决这个问题。

On Mac OS, you can easily do this from:

Mac OS 上,您可以通过以下方式轻松执行此操作:

       Docker Icon > Preferences >  Advanced > Memory

Drag the scrollbar to maximum (e.g. 4GB). Apply and it will restart the Docker engine.

将滚动条拖到最大(例如 4GB)。应用,它将重新启动 Docker 引擎。

Now run your tensor flow container again.

现在再次运行你的张量流容器。

It was handy to use the docker statscommand in a separate terminal It shows the container memory usage in realtime, and you can see how much memory consumption is growing:

docker stats在单独的终端中使用该命令很方便它实时显示容器内存使用情况,您可以看到有多少内存消耗在增长:

CONTAINER ID   NAME   CPU %   MEM USAGE / LIMIT     MEM %    NET I/O             BLOCK I/O           PIDS
3170c0b402cc   mytf   0.04%   588.6MiB / 3.855GiB   14.91%   13.1MB / 3.06MB     214MB / 3.13MB      21

回答by Poik

Alternatively, you can set the environment variable TF_CPP_MIN_LOG_LEVEL=2to filter out info and warning messages. I found that on this github issue where they complain about the same output. To do so within python, you can use the solution from here:

或者,您可以设置环境变量TF_CPP_MIN_LOG_LEVEL=2以过滤掉信息和警告消息。我发现在这个 github 问题上,他们抱怨相同的输出。要在 python 中执行此操作,您可以使用此处的解决方案:

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

You can even turn it on and off at will with this. I test for the maximum possible batch size before running my code, and I can disable warnings and errors while doing this.

你甚至可以随意打开和关闭它。我在运行我的代码之前测试最大可能的批处理大小,并且我可以在执行此操作时禁用警告和错误。

回答by Ahmed J.

I was running a small model on a CPU and had the same issue. Adding:os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'resolved it.

我在 CPU 上运行一个小模型并且遇到了同样的问题。补充:os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'解决了。

回答by Ahmad Alomari

I was having the same problem, and i concluded that there are two factors to be considered when see this error: 1- batch_size ==> because this responsible for the data size to be processed for each epoch 2- image_size ==> the higher image dimensions (image size), more data to be processed

我遇到了同样的问题,我得出结论,当看到这个错误时,有两个因素需要考虑:1-batch_size ==> 因为这决定了每个时期要处理的数据大小 2- image_size ==> 更高图像尺寸(image size),需要处理的数据较多

So for these two factors, the RAM cannot handle all of required data.

因此对于这两个因素,RAM 无法处理所有需要的数据。

To solve the problem I tried two cases: The first change batch_size form 32 to 3 or 2 The second reduce image_size from (608,608) to (416,416)

为了解决这个问题,我尝试了两种情况:第一次将batch_size从32改为3或2 第二次将image_size从(608,608)减少到(416,416)