postgresql SqlAlchemy:多列的不同计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16728904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:55:56  来源:igfitidea点击:

SqlAlchemy: count of distinct over multiple columns

postgresqlcountsqlalchemydistinctaggregate-functions

提问by EoghanM

I can't do:

我不能这样做:

>>> session.query(
        func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)

I can do:

我可以:

session.query(
        func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()

Which is fine (count of unique users in a 'pageload' db table).

这很好(“页面加载”数据库表中的唯一用户数)。

This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:

这在一般情况下是不正确的,例如,下表的计数为 1 而不是 2:

 col_a | col_b
----------------
  xx   |  yy
  xxy  |  y

Is there any way to generate the following SQL (which is valid in postgresql at least)?

有没有办法生成以下 SQL(至少在 postgresql 中是有效的)?

SELECT count(distinct (col_a, col_b)) FROM my_table;

采纳答案by Ilja Everil?

The exact query can be produced using the tuple_()construct:

可以使用以下tuple_()构造生成确切的查询:

session.query(
    func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()

回答by RedNaxel

distinct()accepts more than one argument when appended to the query object:

distinct()附加到查询对象时接受多个参数:

session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()

It should generate something like:

它应该生成如下内容:

SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1

which is even a bit closer to what you wanted.

这甚至更接近你想要的。

回答by vvladymyrov

Looks like sqlalchemy distinct() accepts only one column or expression.

看起来 sqlalchemy distinct() 只接受一列或表达式。

Another way around is to use group_byand count. This should be more efficient than using concatof two columns - with group by database would be able to use indexes if they do exist:

另一种解决方法是使用group_byand count。这应该比使用concat两列更有效- 如果索引确实存在,则按数据库分组将能够使用索引:

session.query(Hit.ip_address, Hit.user_agent).\
    group_by(Hit.ip_address, Hit.user_agent).count()

Generated query would still look different from what you asked about:

生成的查询看起来仍与您询问的内容不同:

SELECT count(*) AS count_1 
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2 
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1