增加 Python 中的内存限制?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44508254/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:04:24  来源:igfitidea点击:

Increasing memory limit in Python?

pythonmemory-management

提问by Maor

I am currently using a function making extremely long dictionaries (used to compare DNA strings) and sometimes I'm getting MemoryError. Is there a way to allot more memory to Python so it can deal with more data at once?

我目前正在使用一个函数制作极长的字典(用于比较 DNA 字符串),有时我会遇到 MemoryError。有没有办法为 Python 分配更多内存,以便它可以一次处理更多数据?

回答by cs95

Python doesn't limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resourcemodule, but it isn't what you're looking for.

Python 不会限制程序的内存使用。它将根据您的程序需要分配尽可能多的内存,直到您的计算机内存不足。您最多可以做的是将限制减少到固定的上限。这可以用resource模块来完成,但这不是你要找的。

You'd need to look at making your code more memory/performance friendly.

您需要考虑使您的代码对内存/性能更友好。

回答by napuzba

If you use linux, You can try to Extend Memory with Swap- a simple way to run programs which require more memory than installed in the machine.

如果您使用 linux,您可以尝试使用 Swap 扩展内存- 一种运行需要比机器中安装的内存更多的程序的简单方法。

However, a better way is to update to program to handle the data in chunks if possible or to extend memory in the machine as there is a performance penalty using this method (using the slower disk device).

但是,更好的方法是更新程序以在可能的情况下以块的形式处理数据或扩展机器中的内存,因为使用此方法(使用较慢的磁盘设备)会降低性能。

回答by Mandy

Python has MomeoryError which is the limit of your System RAM util you've defined it manually with resourcepackage.

Python 有 MomeoryError,这是您使用 package.json 手动定义的系统 RAM 实用程序的限制resource

Defining your class with slotsmakes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!

使用插槽定义类使 python 解释器知道类的属性/成员是固定的。并且可以节省大量内存!

You can reduce dict creation by python interpreter by using __slot__. This will tell interpreter to not create dict internally and reuse same variable.

您可以使用 .python 解释器减少 dict 创建__slot__。这将告诉解释器不要在内部创建 dict 并重用相同的变量。

If the memory consumed by your python processes will continue to grow with time. This seems to be a combination of:

如果您的 Python 进程消耗的内存会随着时间的推移而继续增长。这似乎是以下各项的组合:

  • How the C memory allocator in Python works. This is essentially memory fragmentation, because the allocation cannot call ‘free' unless the entire memory chunk is unused. But the memory chunk usage is usually not perfectly aligned to the objects that you are creating and using.
  • Using a number of small string to compare data. A process called interning used internally but creating multiple small strings brings load on interpreter.
  • Python 中的 C 内存分配器如何工作。这本质上是内存碎片,因为除非整个内存块都未使用,否则分配不能称为“空闲”。但是内存块的使用通常与您正在创建和使用的对象不完全一致。
  • 使用多个小字符串来比较数据。内部使用称为实习的过程,但创建多个小字符串会给解释器带来负担。

The best way is to create Worker Thread or single threaded pool to do your work and invalidate worker/kill to free up resources attached/used in worker thread.

最好的方法是创建工作线程或单线程池来完成您的工作并使工作线程/kill 无效以释放工作线程中附加/使用的资源。

Below code creates single thread worker :

下面的代码创建单线程工作者:

__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
    '''max_workers=1 will create single threadpool'''
    dna_differences_map={}
    count = 0
    dna_processed = False;
    for future in concurrent.futures.as_completed(futures):
        result_dict = future.result()
        if result_dict :
            count += 1
            '''Do your processing XYZ here'''
    logger.info('Total dna keys processed ' + str(count))

def getDnaDict(lock,dna_key):
    '''process dna_key here and return item'''
    try:
        dataItem = item[0]
        return dataItem
    except:
        lock.acquire()
        errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
                          'dna_key_4': 'No data for dna found'})
        lock.release()
        logger.error('Error in processing dna :'+ dna_key)
    pass

if __name__ == "__main__":
    dna1 = '''get data for dna1'''
    dna2 = '''get data for dna2'''
    process_dna_compare(dna1,dna2)
    if errorResultMap != []:
       ''' print or write to file the errorResultMap'''

Below code will help you understand memory usage :

下面的代码将帮助您了解内存使用情况:

import objgraph
import random
import inspect

class Dna(object):
    def __init__(self):
        self.val = None
    def __str__(self):
        return "dna – val: {0}".format(self.val)

def f():
    l = []
    for i in range(3):
        dna = Dna()
        #print “id of dna: {0}”.format(id(dna))
        #print “dna is: {0}”.format(dna)
        l.append(dna)
    return l

def main():
    d = {}
    l = f()
    d['k'] = l
    print("list l has {0} objects of type Dna()".format(len(l)))
    objgraph.show_most_common_types()
    objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
    filename="dna_refs.png")

    objgraph.show_refs(d, filename='myDna-image.png')

if __name__ == "__main__":
    main()

Output for memory usage :

内存使用输出:

list l has 3 objects of type Dna()
function                   2021
wrapper_descriptor         1072
dict                       998
method_descriptor          778
builtin_function_or_method 759
tuple                      667
weakref                    577
getset_descriptor          396
member_descriptor          296
type                       180

More read on slots please visit : https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/

更多关于插槽的阅读请访问:https: //elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/