Ruby-on-rails 使用 Active Record、Rails 和 Postgres 查找具有多个重复字段的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21669202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 23:20:42  来源:igfitidea点击:

Find rows with multiple duplicate fields with Active Record, Rails & Postgres

ruby-on-railspostgresqlactiverecord

提问by newUserNameHere

What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?

使用 Postgres 和 Activerecord 在多列中查找具有重复值的记录的最佳方法是什么?

I found this solution here:

我在这里找到了这个解决方案:

User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )

User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )

But it doesn't seem to work with postgres. I'm getting this error:

但它似乎不适用于 postgres。我收到此错误:

PG::GroupingError: ERROR: column "parts.id" must appear in the GROUP BY clause or be used in an aggregate function

PG::GroupingError: ERROR: 列“parts.id”必须出现在 GROUP BY 子句中或用于聚合函数中

回答by newUserNameHere

Tested & Working Version

测试和工作版本

User.select(:first,:email).group(:first,:email).having("count(*) > 1")

Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .size at the end:

此外,这有点无关但很方便。如果您想查看每个组合被找到的次数,请将 .size 放在最后:

User.select(:first,:email).group(:first,:email).having("count(*) > 1").size

and you'll get a result set back that looks like this:

你会得到一个看起来像这样的结果集:

{[nil, nil]=>512,
 ["Joe", "[email protected]"]=>23,
 ["Jim", "[email protected]"]=>36,
 ["John", "[email protected]"]=>21}

Thought that was pretty cool and hadn't seen it before.

觉得这很酷,以前没见过。

Credit to Taryn, this is just a tweaked version of her answer.

感谢 Taryn,这只是她答案的调整版本。

回答by Taryn East

That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.

发生该错误是因为 POSTGRES 要求您将分组列放在 SELECT 子句中。

try:

尝试:

User.select(:first,:email).group(:first,:email).having("count(*) > 1").all

(note: not tested, you may need to tweak it)

(注意:未测试,您可能需要调整它)

EDITED to remove id column

编辑以删除 id 列

回答by Ben Aubin

If you need the full models, try the following (based on @newUserNameHere's answer).

如果您需要完整模型,请尝试以下操作(基于 @newUserNameHere 的答案)。

User.where(email: User.select(:email).group(:email).having("count(*) > 1").select(:email))

This will return the rows where the email address of the row is not unique.

这将返回行的电子邮件地址不唯一的行。

I'm not aware of a way to do this over multiple attributes.

我不知道有一种方法可以通过多个属性来做到这一点。

回答by itsnikolay

Get all duplicates with a single queryif you use PostgreSQL:

如果您使用PostgreSQL ,则使用单个查询获取所有重复项:

def duplicated_users
  duplicated_ids = User
    .group(:first, :email)
    .having("COUNT(*) > 1")
    .select('unnest((array_agg("id"))[2:])')

  User.where(id: duplicated_ids)
end

irb> duplicated_users

回答by Nuno Costa

Based on the answer aboveby @newUserNameHere I believe the right way to show the count for each is

根据@newUserNameHere以上答案,我相信显示每个计数的正确方法是

res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')

res.each {|r| puts r.attributes } ; nil