Ruby-on-rails 使用 Active Record、Rails 和 Postgres 查找具有多个重复字段的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21669202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find rows with multiple duplicate fields with Active Record, Rails & Postgres
提问by newUserNameHere
What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?
使用 Postgres 和 Activerecord 在多列中查找具有重复值的记录的最佳方法是什么?
I found this solution here:
我在这里找到了这个解决方案:
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
But it doesn't seem to work with postgres. I'm getting this error:
但它似乎不适用于 postgres。我收到此错误:
PG::GroupingError: ERROR: column "parts.id" must appear in the GROUP BY clause or be used in an aggregate function
PG::GroupingError: ERROR: 列“parts.id”必须出现在 GROUP BY 子句中或用于聚合函数中
回答by newUserNameHere
Tested & Working Version
测试和工作版本
User.select(:first,:email).group(:first,:email).having("count(*) > 1")
Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .size at the end:
此外,这有点无关但很方便。如果您想查看每个组合被找到的次数,请将 .size 放在最后:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").size
and you'll get a result set back that looks like this:
你会得到一个看起来像这样的结果集:
{[nil, nil]=>512,
["Joe", "[email protected]"]=>23,
["Jim", "[email protected]"]=>36,
["John", "[email protected]"]=>21}
Thought that was pretty cool and hadn't seen it before.
觉得这很酷,以前没见过。
Credit to Taryn, this is just a tweaked version of her answer.
感谢 Taryn,这只是她答案的调整版本。
回答by Taryn East
That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.
发生该错误是因为 POSTGRES 要求您将分组列放在 SELECT 子句中。
try:
尝试:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").all
(note: not tested, you may need to tweak it)
(注意:未测试,您可能需要调整它)
EDITED to remove id column
编辑以删除 id 列
回答by Ben Aubin
If you need the full models, try the following (based on @newUserNameHere's answer).
如果您需要完整模型,请尝试以下操作(基于 @newUserNameHere 的答案)。
User.where(email: User.select(:email).group(:email).having("count(*) > 1").select(:email))
This will return the rows where the email address of the row is not unique.
这将返回行的电子邮件地址不唯一的行。
I'm not aware of a way to do this over multiple attributes.
我不知道有一种方法可以通过多个属性来做到这一点。
回答by itsnikolay
Get all duplicates with a single queryif you use PostgreSQL:
如果您使用PostgreSQL ,则使用单个查询获取所有重复项:
def duplicated_users
duplicated_ids = User
.group(:first, :email)
.having("COUNT(*) > 1")
.select('unnest((array_agg("id"))[2:])')
User.where(id: duplicated_ids)
end
irb> duplicated_users
回答by Nuno Costa
Based on the answer aboveby @newUserNameHere I believe the right way to show the count for each is
根据@newUserNameHere以上的答案,我相信显示每个计数的正确方法是
res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')
res.each {|r| puts r.attributes } ; nil

