python sys.intern 有什么作用,什么时候使用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1136826/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:33:22  来源:igfitidea点击:

What does python sys.intern do, and when should it be used?

pythonmemorymemory-managementpython-3.x

提问by pufferfish

I came across this questionabout memory management of dictionaries, which mentions the internfunction. What exactly does it do, and when would it be used?

我遇到了这个关于字典内存管理的问题,其中提到了实习生功能。它究竟有什么作用,何时使用?

To give an example:

举个例子:

If I have a set called seen, that contains tuples in the form (string1,string2), which I use to check for duplicates, would storing (intern(string1),intern(string2)) improve performance w.r.t. memory or speed?

如果我有一个名为seen的集合,它包含形式 (string1,string2) 的元组,我用它来检查重复项,存储 (intern(string1),intern(string2)) 会提高性能wrt内存或速度吗?

回答by

From the Python 3documentation:

Python 3文档

sys.intern(string)

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

在“interned”字符串表中输入 string 并返回 interned 字符串——它是字符串本身或副本。插入字符串对于在字典查找中获得一点性能很有用——如果字典中的键被插入,并且查找键被插入,则键比较(散列后)可以通过指针比较而不是字符串比较来完成。通常,Python 程序中使用的名称会自动插入,用于保存模块、类或实例属性的字典具有插入键。

实习字符串不是不朽的;您必须保留对 intern() 返回值的引用才能从中受益。

Clarification:

澄清

As the documentation suggests, the sys.internfunction is intended to be used for performance optimization.

正如文档所暗示的那样,该sys.intern函数旨在用于性能优化

The sys.internfunction maintains a table of internedstrings. When you attempt to intern a string, the function looks it up in the table and:

sys.intern函数维护一个内部字符串表。当您尝试对字符串进行实习时,该函数会在表中查找它并:

  1. If the string does not exists (hasn't been interned yet) the function saves it in the table and returns it from the interned strings table.

    >>> import sys
    >>> a = sys.intern('why do pangolins dream of quiche')
    >>> a
    'why do pangolins dream of quiche'
    

    In the above example, aholds the interned string. Even though it is not visible, the sys.internfunction has saved the 'why do pangolins dream of quiche'string object in the interned strings table.

  2. If the string exists (has been interned) the function returns it from the interned strings table.

    >>> b = sys.intern('why do pangolins dream of quiche')
    >>> b
    'why do pangolins dream of quiche'
    

    Even though it is not immediately visible, because the string 'why do pangolins dream of quiche'has been interned before, bholds now the same string object as a.

    >>> b is a
    True
    

    If we create the same string without using intern, we end up with two different string objects that have the same value.

    >>> c = 'why do pangolins dream of quiche'
    >>> c is a
    False
    >>> c is b
    False
    
  1. 如果该字符串不存在(尚未被实习),则该函数将其保存在表中并从实习字符串表中返回它。

    >>> import sys
    >>> a = sys.intern('why do pangolins dream of quiche')
    >>> a
    'why do pangolins dream of quiche'
    

    在上面的例子中,a保存了 interned 字符串。尽管它不可​​见,但该sys.intern函数已将'why do pangolins dream of quiche'字符串对象保存在内部字符串表中。

  2. 如果字符串存在(已被实习),则该函数从实习字符串表中返回它。

    >>> b = sys.intern('why do pangolins dream of quiche')
    >>> b
    'why do pangolins dream of quiche'
    

    即使它不是立即可见的,因为该字符串'why do pangolins dream of quiche'之前已经被实习过,b现在它与a.

    >>> b is a
    True
    

    如果我们在不使用实习生的情况下创建相同的字符串,我们最终会得到两个具有相同值的不同字符串对象。

    >>> c = 'why do pangolins dream of quiche'
    >>> c is a
    False
    >>> c is b
    False
    

By using sys.internyou ensure that you never create two string objects that have the same value—when you request the creation of a second string object with the same value as an existing string object, you receive a reference to the pre-existing string object. This way, you are saving memory. Also, string objects comparison is now very efficientbecause it is carried out by comparing the memory addresses of the two string objects instead of their content.

通过使用,sys.intern您可以确保永远不会创建两个具有相同值的字符串对象——当您请求创建与现有字符串对象具有相同值的第二个字符串对象时,您会收到对预先存在的字符串对象的引用。这样,您就节省了内存。此外,字符串对象比较现在非常有效,因为它是通过比较两个字符串对象的内存地址而不是它们的内容来执行的。

回答by Brian

Essentially intern looks up (or stores if not present) the string in a collection of interned strings, so all interned instances will share the same identity. You trade the one-time cost of looking up this string for faster comparisons (the compare can return True after just checking for identity, rather than having to compare each character), and reduced memory usage.

本质上,实习生在一组实习字符串中查找(或存储,如果不存在)字符串,因此所有实习实例将共享相同的身份。您将查找此字符串的一次性成本用于更快的比较(比较可以在检查身份后返回 True,而不必比较每个字符),并减少内存使用。

However, python will automatically intern strings that are small, or look like identifiers, so you may find you get no improvement because your strings are already being interned behind the scenes. For example:

但是,python 会自动对较小的字符串进行实习,或者看起来像 identifiers,因此您可能会发现没有任何改进,因为您的字符串已经在幕后实习。例如:

>>> a = 'abc'; b = 'abc'
>>> a is b
True

In the past, one disadvantage was that interned strings were permanent. Once interned, the string memory was never freed even after all references were dropped. I think this is no longer the case for more recent versions of python though.

过去,一个缺点是内部字符串是永久性的。一旦被实习,即使在所有引用都被删除之后,字符串内存也不会被释放。不过,我认为更新版本的 python 不再是这种情况。

回答by SilentGhost

They weren't talking about keyword internbecause there is no such thing in Python. They were talking about non-essential built-in function intern. Which in py3k has been moved to sys.intern. Docs have an exhaustive description.

他们不是在谈论关键字,intern因为 Python 中没有这样的东西。他们在谈论非必要的内置函数intern。py3k 中的哪个已移至sys.intern. 文档有详尽的描述。

回答by flybywire

It returns a canonical instance of the string.

它返回字符串的规范实例。

Therefore if you have many string instances that are equal you save memory, and in addition you can also compare canonicalized strings by identity instead of equality which is faster.

因此,如果您有许多相等的字符串实例,则可以节省内存,此外,您还可以通过身份而不是相等来比较规范化的字符串,这样会更快。