MySQL 为什么 SQLAlchemy count() 比原始查询慢得多?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14754994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:26:48  来源:igfitidea点击:

Why is SQLAlchemy count() much slower than the raw query?

mysqlsqlalchemy

提问by mtth

I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy countfunction takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?

我将 SQLAlchemy 与 MySQL 数据库一起使用,我想计算表中的行数(大约 300k)。SQLAlchemy count函数的运行时间大约是直接在 MySQL 中编写相同查询的 50 倍。难道我做错了什么?

# this takes over 3 seconds to return
session.query(Segment).count()

However:

然而:

SELECT COUNT(*) FROM segments;
+----------+
| COUNT(*) |
+----------+
|   281992 |
+----------+
1 row in set (0.07 sec)

The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).

速度差异随着表的大小而增加(在 100k 行下几乎不明显)。

Update

更新

Using session.query(Segment.id).count()instead of session.query(Segment).count()seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.

使用session.query(Segment.id).count()而不是session.query(Segment).count()似乎可以解决问题并加快速度。我仍然不明白为什么初始查询速度较慢。

回答by zzzeek

Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docspoint out that the "optimized" query can be achieved using query(func.count(Segment.id)):

不幸的是,MySQL 对子查询的支持非常糟糕,这对我们产生了非常负面的影响。在SQLAlchemy的文档指出,“优化”查询可以使用实现query(func.count(Segment.id))

Return a count of rows this Query would return.

This generates the SQL for this Query as follows:

SELECT count(1) AS count_1 FROM (
     SELECT <rest of query follows...> ) AS anon_1

For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:

from sqlalchemy import func

# count User records, without
# using a subquery.
session.query(func.count(User.id))

# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
        group_by(User.name)

from sqlalchemy import distinct

# count distinct "name" values
session.query(func.count(distinct(User.name)))

返回此查询将返回的行数。

这将为此查询生成 SQL,如下所示:

SELECT count(1) AS count_1 FROM (
     SELECT <rest of query follows...> ) AS anon_1

要对要计数的特定列进行细粒度控制,要跳过子查询的使用或以其他方式控制 FROM 子句,或使用其他聚合函数,请将 func 表达式与 query() 结合使用,即:

from sqlalchemy import func

# count User records, without
# using a subquery.
session.query(func.count(User.id))

# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
        group_by(User.name)

from sqlalchemy import distinct

# count distinct "name" values
session.query(func.count(distinct(User.name)))

回答by Jerry K.

It took me a long time to find this as the solution to my problem. I was getting the following error:

我花了很长时间才找到它来解决我的问题。我收到以下错误:

sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): Incorrect key file for table '/tmp/#sql_40ab_0.MYI'; try to repair it

sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): 表'/tmp/#sql_40ab_0.MYI'的密钥文件不正确;尝试修复它

The problem was resolved when I changed this:

当我更改此问题时,问题已解决:

query = session.query(rumorClass).filter(rumorClass.exchangeDataState == state)
return query.count()

to this:

对此:

query = session.query(func.count(rumorClass.id)).filter(rumorClass.exchangeDataState == state)
return query.scalar()

回答by aeb0

The reason is that SQLAlchemy's count() is counting the results of a subquery which is still doing the full amount of work to retrieve the rows you are counting. This behavior is agnostic of the underlying database; it isn't a problem with MySQL.

原因是 SQLAlchemy 的 count() 正在计算子查询的结果,该子查询仍在执行全部工作来检索您正在计算的行。这种行为与底层数据库无关;这不是 MySQL 的问题。

The SQLAlchemy docsexplain how to issue a count without a subquery by importing funcfrom sqlalchemy.

SQLAlchemy文档解释了如何通过funcsqlalchemy.

session.query(func.count(User.id)).scalar()

>>>SELECT count(users.id) AS count_1 \nFROM users')