SQL 删除 Django 查询中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5877306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove duplicates in a django query
提问by David542
Is there a simple way to remove duplicates in the following basic query --
是否有一种简单的方法可以删除以下基本查询中的重复项——
email_list = Emails.objects.order_by('email')
I tried using duplicate() but it was not working. Could you please show me the exact syntax for doing this query without duplicates? Thank you.
我尝试使用重复(),但它不起作用。你能告诉我在没有重复的情况下执行这个查询的确切语法吗?谢谢你。
回答by Daniel Roseman
This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.
这个查询不会给你重复——也就是说,它会给你数据库中的所有行,通过电子邮件排序。
However, I presume what you mean is that you have duplicate data within your database. Adding distinct()
here won't help, because even if you have only one field, you also have an automatic id
field - so the combination of id+email is not unique.
但是,我认为您的意思是您的数据库中有重复的数据。在distinct()
此处添加无济于事,因为即使您只有一个字段,您也有一个自动id
字段 - 因此 id+email 的组合不是唯一的。
Assuming you only need one field, email_address
, de-duplicated, you can do this:
假设您只需要一个字段,email_address
,去重,您可以这样做:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
但是,您应该真正解决根本问题,并从数据库中删除重复数据。
Example, deleting duplicate Emails by email field:
例如,通过电子邮件字段删除重复的电子邮件:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
或书名:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()
回答by Parag Tyagi -morpheus-
For checking duplicate you can do a GROUP_BY
and HAVING
in Django
as below. We are using Django annotations
here.
对于检查重复,你可以做一个GROUP_BY
,并HAVING
在Django
下面。我们在annotations
这里使用 Django 。
from django.db.models import Count
from app.models import Email
duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)
Now looping through the above data and deleting all other emails
except the first one (depends on requirement or whatever).
现在循环遍历上述数据并删除emails
除第一个之外的所有其他数据(取决于要求或其他)。
for data in duplicates_emails:
email = data['email']
Email.objects.filter(email=email).order_by('pk')[1:].delete()
回答by zeekay
You can chain .distinct()
on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
您可以.distinct()
在查询集的末尾链接以过滤重复项。查看:http: //docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
回答by Michael C. O'Connor
You may be able to use the distinct()
function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:
您或许可以使用该distinct()
功能,具体取决于您的型号。如果您只想从模型中检索单个字段,您可以执行以下操作:
email_list = Emails.objects.values_list('email').order_by('email').distinct()
which should give you an ordered list of emails.
这应该给你一个有序的电子邮件列表。
回答by SuperNova
You can also use set()
你也可以使用 set()
email_list = set(Emails.objects.values_list('email', flat=True))
回答by Chris Montanaro
I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.
我使用以下内容从数据库中实际删除了重复条目,希望这对其他人有所帮助。
adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:
if i not in d:
address.delete()