python sqlalchemy,将 ID 列表转换为对象列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/444475/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:08:22  来源:igfitidea点击:

sqlalchemy, turning a list of IDs to a list of objects

pythonsqlalchemy

提问by Cheery

I have sequence of IDs I want to retrieve. It's simple:

我有要检索的 ID 序列。这很简单:

session.query(Record).filter(Record.id.in_(seq)).all()

Is there a better way to do it?

有没有更好的方法来做到这一点?

采纳答案by Adam Dziendziel

Your code is absolutety fine.

你的代码绝对没问题。

INis like a bunch of X=Yjoined with ORand is pretty fast in contemporary databases.

IN就像一堆X=Y连接在一起OR并且在当代数据库中非常快。

However, if your list of IDs is long, you could make the query a bit more efficient by passing a sub-query returning the list of IDs.

但是,如果您的 ID 列表很长,您可以通过传递一个返回 ID 列表的子查询来提高查询效率。

回答by zzzeek

The code as is is completely fine. However, someone is asking me for some system of hedging between the two approaches of doing a big IN vs. using get() for individual IDs.

代码完全没问题。然而,有人问我在做大 IN 与使用 get() 对单个 ID 的两种方法之间的对冲系统。

If someone is really trying to avoid the SELECT, then the best way to do that is to set up the objects you need in memory ahead of time. Such as, you're working on a large table of elements. Break up the work into chunks, such as, order the full set of work by primary key, or by date range, whatever, then load everything for that chunk locally into a cache:

如果有人真的想避免 SELECT,那么最好的方法是提前在内存中设置您需要的对象。例如,您正在处理一个大型元素表。将工作分解成块,例如,按主键或日期范围对整个工作集进行排序,然后将该块的所有内容本地加载到缓存中:

 all_ids = [<huge list of ids>]

 all_ids.sort()
 while all_ids:
     chunk = all_ids[0:1000]

     # bonus exercise!  Throw each chunk into a multiprocessing.pool()!
     all_ids = all_ids[1000:]

     my_cache = dict(
           Session.query(Record.id, Record).filter(
                 Record.id.between(chunk[0], chunk[-1]))
     )

     for id_ in chunk:
         my_obj = my_cache[id_]
         <work on my_obj>

That's the real world use case.

这就是现实世界的用例。

But to also illustrate some SQLAlchemy API, we can make a function that does the IN for records we don't have and a local get for those we do. Here is that:

但是为了说明一些 SQLAlchemy API,我们可以创建一个函数,为我们没有的记录执行 IN,并为我们拥有的记录执行本地获取。这是:

from sqlalchemy import inspect


def get_all(session, cls, seq):
    mapper = inspect(cls)
    lookup = set()
    for ident in seq:
        key = mapper.identity_key_from_primary_key((ident, ))
        if key in session.identity_map:
            yield session.identity_map[key]
        else:
            lookup.add(ident)
    if lookup:
        for obj in session.query(cls).filter(cls.id.in_(lookup)):
            yield obj

Here is a demonstration:

这是一个演示:

from sqlalchemy import Column, Integer, create_engine, String
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import random

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    data = Column(String)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

ids = range(1, 50)

s = Session(e)
s.add_all([A(id=i, data='a%d' % i) for i in ids])
s.commit()
s.close()

already_loaded = s.query(A).filter(A.id.in_(random.sample(ids, 10))).all()

assert len(s.identity_map) == 10

to_load = set(random.sample(ids, 25))
all_ = list(get_all(s, A, to_load))

assert set(x.id for x in all_) == to_load

回答by G.J

If you use composite primary keys, you can use tuple_, as in

如果使用复合主键,则可以使用tuple_,如

from sqlalchemy import tuple_
session.query(Record).filter(tuple_(Record.id1, Record.id2).in_(seq)).all()

Note that this is not available on SQLite (see doc).

请注意,这在 SQLite 上不可用(请参阅doc)。

回答by SingleNegationElimination

There is one other way; If it's reasonable to expect that the objects in question are already loaded into the session; you've accessed them before in the same transaction, you can instead do:

还有另一种方式;如果期望有问题的对象已经加载到会话中是合理的;您之前在同一事务中访问过它们,您可以改为:

map(session.query(Record).get, seq)

In the case where those objects are already present, this will be much faster, since there won't be any queries to retrieve those objects; On the other hand, if more than a tiny number of those objects are notloaded, it will be much, much slower, since it will cause a query per missing instance, instead of a single query for all objects.

在那些对象已经存在的情况下,这会快得多,因为不会有任何查询来检索这些对象;另一方面,如果这些对象中的一小部分没有加载,它会慢很多,因为它会导致每个丢失的实例进行查询,而不是对所有对象进行单个查询。

This can be useful when you are doing joinedload()queries before reaching the above step, so you can be sure that they have been loaded already. In general, you should use the solution in the question by default, and only explore this solution when you have seen that you are querying for the same objects over and over.

当您joinedload()在到达上述步骤之前进行查询时,这会很有用,因此您可以确保它们已经被加载。一般来说,您应该默认使用问题中的解决方案,并且只有在您看到您一遍又一遍地查询相同对象时才探索此解决方案。

回答by iny

I'd recommend to take a look at the SQL it produces. You can just print str(query) to see it.

我建议看看它产生的 SQL。你可以只打印 str(query) 来查看它。

I'm not aware of an ideal way of doing it with standard SQL.

我不知道使用标准 SQL 执行此操作的理想方法。