SQL 删除 Django 查询中的重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5877306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 10:23:53  来源:igfitidea点击:

Remove duplicates in a django query

sqldjango

提问by David542

Is there a simple way to remove duplicates in the following basic query --

是否有一种简单的方法可以删除以下基本查询中的重复项——

email_list = Emails.objects.order_by('email')

I tried using duplicate() but it was not working. Could you please show me the exact syntax for doing this query without duplicates? Thank you.

我尝试使用重复(),但它不起作用。你能告诉我在没有重复的情况下执行这个查询的确切语法吗?谢谢你。

回答by Daniel Roseman

This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.

这个查询不会给你重复——也就是说,它会给你数据库中的所有行,通过电子邮件排序。

However, I presume what you mean is that you have duplicate data within your database. Adding distinct()here won't help, because even if you have only one field, you also have an automatic idfield - so the combination of id+email is not unique.

但是,我认为您的意思是您的数据库中有重复的数据。在distinct()此处添加无济于事,因为即使您只有一个字段,您也有一个自动id字段 - 因此 id+email 的组合不是唯一的。

Assuming you only need one field, email_address, de-duplicated, you can do this:

假设您只需要一个字段,email_address,去重,您可以这样做:

email_list = Email.objects.values_list('email', flat=True).distinct()

However, you should really fix the root problem, and remove the duplicate data from your database.

但是,您应该真正解决根本问题,并从数据库中删除重复数据。

Example, deleting duplicate Emails by email field:

例如,通过电子邮件字段删除重复的电子邮件:

for email in Email.objects.values_list('email', flat=True).distinct():
    Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()

Or books by name:

或书名:

for name in Book.objects.values_list('name', flat=True).distinct(): 
    Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()

回答by Parag Tyagi -morpheus-

For checking duplicate you can do a GROUP_BYand HAVINGin Djangoas below. We are using Django annotationshere.

对于检查重复,你可以做一个GROUP_BY,并HAVINGDjango下面。我们在annotations这里使用 Django 。

from django.db.models import Count
from app.models import Email

duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)

Now looping through the above data and deleting all other emailsexcept the first one (depends on requirement or whatever).

现在循环遍历上述数据并删除emails除第一个之外的所有其他数据(取决于要求或其他)。

for data in duplicates_emails:
    email = data['email']
    Email.objects.filter(email=email).order_by('pk')[1:].delete()

回答by zeekay

You can chain .distinct()on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct

您可以.distinct()在查询集的末尾链接以过滤重复项。查看:http: //docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct

回答by Michael C. O'Connor

You may be able to use the distinct()function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:

您或许可以使用该distinct()功能,具体取决于您的型号。如果您只想从模型中检索单个字段,您可以执行以下操作:

email_list = Emails.objects.values_list('email').order_by('email').distinct()

which should give you an ordered list of emails.

这应该给你一个有序的电子邮件列表。

回答by SuperNova

You can also use set()

你也可以使用 set()

email_list = set(Emails.objects.values_list('email', flat=True))

回答by Chris Montanaro

I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.

我使用以下内容从数据库中实际删除了重复条目,希望这对其他人有所帮助。

adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:    
  if i not in d:
    address.delete()