在 Excel 中,通过 VBA 或公式/函数的组合,根据另一列中的值从一列中删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40521860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 11:30:49  来源:igfitidea点击:

In Excel, remove duplicates from one column based on the values in another column, either through VBA or a combination of formulas/functions

excelexcel-vbaexcel-formulaexcel-2010vba

提问by Monomeeth

I'm having trouble trying to achieve this in an accurate and automated way. I've tried the approaches discussed here, hereand here, but none work in my scenario.

我在尝试以准确和自动化的方式实现这一目标时遇到了麻烦。我已经尝试了这里这里这里讨论的方法,但在我的场景中没有任何工作。

I have a spreadsheet with thousands of rows of data. Data is organised as follows:

我有一个包含数千行数据的电子表格。数据组织如下:

  • Column A contains IP addresses in General format
  • Column B contains Date/Time in the following Custom format (d/mm/yyyy h:mm)
  • Column C contains duration in the following Custom format (h:mm:ss)
  • A 列包含通用格式的IP 地址
  • B 列包含以下自定义格式的日期/时间 ( d/mm/yyyy h:mm)
  • C 列包含以下自定义格式 ( h:mm:ss) 的持续时间

This data contains a number of duplicates I need to remove, based on the IP address in Column A. However, the criteria I need is to remove whichever duplicates are notthe longest duration. To better explain my scenario, see sample image below:

根据 A 列中的 IP 地址,此数据包含许多我需要删除的重复项。但是,我需要的标准是删除持续时间长的重复项。为了更好地解释我的场景,请参见下面的示例图片:

enter image description here

在此处输入图片说明

I need a way to remove all duplicates of a particular IP address that do not contain the longest duration for that IP address. So, using the above example, row 3 would be deleted because the duration of 1 minute is shorter than 36 minutes in row 4 that contains the same IP address.

我需要一种方法来删除特定 IP 地址的所有重复项,这些重复项不包含该 IP 地址的最长持续时间。因此,使用上面的示例,第 3 行将被删除,因为 1 分钟的持续时间比包含相同 IP 地址的第 4 行中的 36 分钟短。

Another example is that rows 5, 6 and 7 would also be removed as all their durations are shorter than row 8 which has the same IP address but a longer duration. Of course, any rows already containing unique IP addresses would be left alone. The end result using my above sample would be as follows:

另一个例子是第 5、6 和 7 行也将被删除,因为它们的所有持续时间都比具有相同 IP 地址但持续时间更长的第 8 行短。当然,任何已经包含唯一 IP 地址的行都将被保留。使用我上面的示例的最终结果如下:

enter image description here

在此处输入图片说明

Of course, in my sample above all the data was nicely sorted by IP address first and Duration second. In real life this isn't the case, but that's something easy enough for me to do prior to any solution, if necessary.

当然,在我上面的示例中,所有数据首先按 IP 地址排序,然后按持续时间排序。在现实生活中情况并非如此,但如果有必要,在任何解决方案之前,这对我来说很容易做到。

The key thing is that in some cases an IP address may be duplicated once, in others it may be duplicated many times over. I just need to ensure that only the one with the longest duration remains. In the event that multiple instances of an IP address has the same longest duration, then I want them all kept. That is, if an IP address is repeated ten times and its longest duration is an hour for two of those times, then both of them need to remain.

关键是,在某些情况下,IP 地址可能会被复制一次,而在其他情况下,它可能会被复制多次。我只需要确保只保留持续时间最长的那个。如果一个 IP 地址的多个实例具有相同的最长持续时间,那么我希望它们都保留。也就是说,如果一个 IP 地址重复十次,并且其中两次的最长持续时间为一个小时,则它们都需要保留。

I'm happy with any solution for this, be it using formulas, functions or macros.

我对任何解决方案都很满意,无论是使用公式、函数还是宏。

回答by Alex Frolov

You can solve your task using the helper column (column D).

您可以使用辅助列(D 列)解决您的任务。

  1. Insert the following array formula to the cell D2:

    =IF($C2=MAX(IF($A2=$A$2:$A$50,$C$2:$C$50,-1)),"Remain","Remove")

    where 50 - the last row of your table

    Remember to press Ctrl+Shift+Enterto complete the array formula correctly.

  2. Copy/paste the formula to the other cells.

  3. Аpply filter to column D by "remove" value

  4. Delete filtered rows.

  1. 将以下数组公式插入单元格 D2:

    =IF($C2=MAX(IF($A2=$A$2:$A$50,$C$2:$C$50,-1)),"Remain","Remove")

    其中 50 - 表格的最后一行

    请记住按Ctrl+Shift+Enter以正确完成数组公式。

  2. 将公式复制/粘贴到其他单元格。

  3. 通过“删除”值将过滤器应用到 D 列

  4. 删除过滤的行。