java 面试题:从未排序的链表中删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4542600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Interview question: remove duplicates from an unsorted linked list
提问by Kiril
I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutionsand I'm trying to solve the following question:
我正在阅读Cracking the Coding Interview,第四版:150 个编程面试问题和解决方案,我正在尝试解决以下问题:
2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?
2.1 编写代码从未排序的链表中删除重复项。跟进:如果不允许使用临时缓冲区,您将如何解决此问题?
I'm solving it in C#, so I made my own Node
class:
我正在用 C# 解决它,所以我创建了自己的Node
类:
public class Node<T> where T : class
{
public Node<T> Next { get; set; }
public T Value { get; set; }
public Node(T value)
{
Next = null;
Value = value;
}
}
My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):
我的解决方案是遍历列表,然后对每个节点遍历列表的其余部分并删除任何重复项(请注意,我并没有按照书中的说明实际编译或测试它):
public void RemoveDuplicates(Node<T> head)
{
// Iterate through the list
Node<T> iter = head;
while(iter != null)
{
// Iterate to the remaining nodes in the list
Node<T> current = iter;
while(current!= null && current.Next != null)
{
if(iter.Value == current.Next.Value)
{
current.Next = current.Next.Next;
}
current = current.Next;
}
iter = iter.Next;
}
}
Here is the solution from the book (the author wrote it in java):
这是书中的解决方案(作者是用java写的):
Without a buffer, we can iterate with two pointers: “current” does a normal iteration, while “runner” iterates through all prior nodes to check for dups. Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.
在没有缓冲区的情况下,我们可以使用两个指针进行迭代:“current”进行正常迭代,而“runner”则遍历所有先前的节点以检查重复项。Runner 只会看到每个节点一个 dup,因为如果有多个重复项,它们早就被删除了。
public static void deleteDups2(LinkedListNode head)
{
if (head == null) return;
LinkedListNode previous = head;
LinkedListNode current = previous.next;
while (current != null)
{
LinkedListNode runner = head;
while (runner != current) { // Check for earlier dups
if (runner.data == current.data)
{
LinkedListNode tmp = current.next; // remove current
previous.next = tmp;
current = tmp; // update current to next node
break; // all other dups have already been removed
}
runner = runner.next;
}
if (runner == current) { // current not updated - update now
previous = current;
current = current.next;
}
}
}
So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?
所以我的解决方案总是寻找当前节点到最后的重复项,而他们的解决方案寻找从头到当前节点的重复项。我觉得这两种解决方案都会遇到性能问题,具体取决于列表中有多少重复项以及它们的分布方式(密度和位置)。但总的来说:我的答案是几乎和书中的答案一样好,还是明显更差?
回答by Merlyn Morgan-Graham
If you give a person a fish, they eat for a day. If you teach a person to fish...
如果你给一个人一条鱼,他们可以吃一天。如果你教一个人钓鱼...
My measures for the quality of an implementation are:
我对实施质量的衡量标准是:
- Correctness: If you aren't getting the right answer in all cases, then it isn't ready
- Readability/maintainability: Look at code repetition, understandable names, the number of lines of code per block/method (and the number of things each block does), and how difficult it is to trace the flow of your code. Look at any number of books focused on refactoring, programming best-practices, coding standards, etc, if you want more information on this.
- Theoretical performance(worst-case and ammortized): Big-Ois a metric you can use. CPU and memory consumption should both be measured
- Complexity: Estimate how it would take an average professional programmer to implement (if they already know the algorithm). See if that is in line with how difficult the problem actually is
- 正确性:如果你在所有情况下都没有得到正确的答案,那么它还没有准备好
- 可读性/可维护性:查看代码重复、可理解的名称、每个块/方法的代码行数(以及每个块所做的事情的数量),以及跟踪代码流的难度。如果您想了解更多关于重构、编程最佳实践、编码标准等方面的书籍,请查看任意数量的书籍。
- 理论性能(最坏情况和摊销): Big-O是您可以使用的指标。应同时测量 CPU 和内存消耗
- 复杂性:估计一个普通的专业程序员如何实现(如果他们已经知道算法)。看看这是否符合问题的实际难度
As for your implementation:
至于你的实现:
- Correctness: I suggest writing unit tests to determine this for yourself and/or debugging it (on paper) from start to finish with interesting sample/edge cases. Null, one item, two items, various numbers of duplicates, etc
- Readability/maintainability: It looks mostly fine, though your last two comments don't add anything. It is a bit more obvious what your code does than the code in the book
- Performance: I believe both are N-squared. Whether the amortized cost is lower on one or the other I'll let you figure out :)
- Time to implement: An average professional should be able to code this algorithm in their sleep, so looking good
- 正确性:我建议自己编写单元测试来确定这一点和/或使用有趣的示例/边缘情况从头到尾调试它(在纸上)。空、一项、两项、不同数量的重复项等
- 可读性/可维护性:虽然您的最后两条评论没有添加任何内容,但看起来基本不错。你的代码所做的事情比书中的代码更明显一点
- 性能:我相信两者都是 N 平方的。一个或另一个的摊销成本是否较低,我会让你弄清楚:)
- 实现时间:一般的专业人士应该能够在他们的睡眠中编写这个算法,所以看起来不错
回答by Rune FS
There's not much of a difference. If I've done my math right your's is on average N/16 slower than the authors but pleanty of cases exist where your implementation will be faster.
没有太大区别。如果我的数学计算正确,您的平均计算速度比作者慢 N/16,但存在大量实现更快的情况。
Edit:
编辑:
I'll call your implementation Y and the author's A
我将你的实现称为 Y 和作者的 A
Both proposed solutions has O(N^2) as worst case and they both have a best case of O(N) when all elements are the same value.
两种提议的解决方案都将 O(N^2) 作为最坏情况,并且当所有元素都具有相同的值时,它们都具有 O(N) 的最佳情况。
EDIT:This is a complete rewrite. Inspired by the debat in the comments I tried to find the average case for random N random numbers. That is a sequence with a random size and a random distribution. What would the average case be.
编辑:这是一个完整的重写。受到评论中争论的启发,我试图找到随机 N 个随机数的平均情况。这是一个具有随机大小和随机分布的序列。平均情况会是什么。
Y will always run U times where U is the number of unique numbers. For each iteration it will do N-X comparisons where X is the number of elements removed prior to the iteration (+1). The first time no element will have been removed and on average on the second iteration N/U will have been removed.
Y 将始终运行 U 次,其中 U 是唯一数字的数量。对于每次迭代,它将进行 NX 比较,其中 X 是迭代之前移除的元素数 (+1)。第一次没有元素被删除,平均在第二次迭代 N/U 将被删除。
That is on average ?N will been left to iterate. We can express the average cost as U*?N. The average U can be expressed based on N as well 0
也就是说平均 ?N 将被留下来迭代。我们可以将平均成本表示为 U*?N。平均 U 也可以基于 N 表示 0
Expressing A becomes more difficult. Let's say we use I iterations before we've encountered all unique values. After that will run between 1 and U comparisons (on average that's U/") and will do that N-I times.
表达 A 变得更加困难。假设我们在遇到所有唯一值之前使用 I 迭代。之后将在 1 和 U 比较(平均为 U/")之间运行,并将执行 NI 次。
I*c+U/2(N-I)
I*c+U/2(NI)
but whats the average number of comparisons (c) we run for the first I iterations. on average we need to compare against half of the elements already visited and on average we've visited I/2 elements, Ie. c=I/4
但是我们在第一次迭代中运行的平均比较次数 (c) 是多少。平均而言,我们需要与已经访问过的元素的一半进行比较,平均而言,我们已经访问了 I/2 个元素,即。c=I/4
I/4+U/2(N-I).
I/4+U/2(NI)。
I can be expressed in terms of N. On average we'll need to visited half on N to find the unique values so I=N/2 yielding an average of
I 可以用 N 表示。平均而言,我们需要访问 N 的一半才能找到唯一值,因此 I=N/2 产生平均值
(I^2)/4+U/2(N-I) which can be reduced to (3*N^2)/16.
(I^2)/4+U/2(NI) 可以减少到 (3*N^2)/16。
That is of course if my estimation of the averages are correct. That is on average for any potential sequence A has N/16 fewer comparisons than Y but pleanty of cases exists where Y is faster than A. So I'd say they are equal when compared to the number of comparisons
那当然是如果我对平均值的估计是正确的。对于任何潜在序列,平均而言,A 的比较次数比 Y 少 N/16,但存在大量 Y 比 A 快的情况。所以我想说它们与比较次数相比是相等的
回答by denniss
How about using a HashMap? This way it will take O(n) time and O(n) space. I will write psuedocode.
使用 HashMap 怎么样?这样它将花费 O(n) 时间和 O(n) 空间。我会写伪代码。
function removeDup(LinkedList list){
HashMap map = new HashMap();
for(i=0; i<list.length;i++)
if list.get(i) not in map
map.add(list.get(i))
else
list.remove(i)
end
end
end
Of course we assume that HashMap has O(1) read and write.
当然我们假设HashMap有O(1)读写。
Another solution is to use a mergesort and removes duplicate from start to end of the list. This takes O(n log n)
另一种解决方案是使用归并排序并从列表的开头到结尾删除重复项。这需要 O(n log n)
mergesort is O(n log n) removing duplicate from a sorted list is O(n). do you know why? therefore the entire operation takes O(n log n)
归并排序是 O(n log n) 从排序列表中删除重复项是 O(n)。你知道为什么吗?因此整个操作需要 O(n log n)
回答by denniss
Heapsortis an in-placesort. You could modify the "siftUp" or "siftDown" function to simply remove the element if it encounters a parent that is equal. This would be O(n log n)
堆排序是一种就地排序。如果元素遇到相等的父元素,您可以修改“siftUp”或“siftDown”函数以简单地删除元素。这将是 O(n log n)
function siftUp(a, start, end) is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := floor((child - 1) ÷ 2)
if a[parent] < a[child] then (out of max-heap order)
swap(a[parent], a[child])
child := parent (repeat to continue sifting up the parent now)
else if a[parent] == a[child] then
remove a[parent]
else
return
回答by Duke
Code in java:
Java中的代码:
public static void dedup(Node head) {
Node cur = null;
HashSet encountered = new HashSet();
while (head != null) {
encountered.add(head.data);
cur = head;
while (cur.next != null) {
if (encountered.contains(cur.next.data)) {
cur.next = cur.next.next;
} else {
break;
}
}
head = cur.next;
}
}
回答by user2864458
Tried the same in cpp. Please let me know your comments on this.
在 cpp 中尝试了相同的方法。请让我知道您对此的评论。
// ConsoleApplication2.cpp : Defines the entry point for the console application. //
// ConsoleApplication2.cpp : 定义控制台应用程序的入口点。//
#include "stdafx.h"
#include <stdlib.h>
struct node
{
int data;
struct node *next;
};
struct node *head = (node*)malloc(sizeof(node));
struct node *tail = (node*)malloc(sizeof(node));
struct node* createNode(int data)
{
struct node *newNode = (node*)malloc(sizeof(node));
newNode->data = data;
newNode->next = NULL;
head = newNode;
return newNode;
}
bool insertAfter(node * list, int data)
{
//case 1 - insert after head
struct node *newNode = (node*)malloc(sizeof(node));
if (!list)
{
newNode->data = data;
newNode->next = head;
head = newNode;
return true;
}
struct node * curpos = (node *)malloc(sizeof(node));
curpos = head;
//case 2- middle, tail of list
while (curpos)
{
if (curpos == list)
{
newNode->data = data;
if (curpos->next == NULL)
{
newNode->next = NULL;
tail = newNode;
}
else
{
newNode->next = curpos->next;
}
curpos->next = newNode;
return true;
}
curpos = curpos->next;
}
}
void deleteNode(node *runner, node * curr){
//DELETE AT TAIL
if (runner->next->next == NULL)
{
runner->next = NULL;
}
else//delete at middle
{
runner = runner->next->next;
curr->next = runner;
}
}
void removedups(node * list)
{
struct node * curr = (node*)malloc(sizeof(node));
struct node * runner = (node*)malloc(sizeof(node));
curr = head;
runner = curr;
while (curr != NULL){
runner = curr;
while (runner->next != NULL){
if (curr->data == runner->next->data){
deleteNode(runner, curr);
}
if (runner->next!=NULL)
runner = runner->next;
}
curr = curr->next;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
struct node * list = (node*) malloc(sizeof(node));
list = createNode(1);
insertAfter(list,2);
insertAfter(list, 2);
insertAfter(list, 3);
removedups(list);
return 0;
}
回答by Prashant Rathi
Code in C:
C中的代码:
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
回答by Prashant Rathi
Here's the answer in C
这是 C 中的答案
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
回答by Rupa Mistry
C# Code for removing duplicates left after the first set of iteration:
用于在第一组迭代后删除重复项的 C# 代码:
public Node removeDuplicates(Node head)
{
if (head == null)
return head;
var current = head;
while (current != null)
{
if (current.next != null && current.data == current.next.data)
{
current.next = current.next.next;
}
else { current = current.next; }
}
return head;
}
回答by VAT
Here's the implementation using HashSet in O(n)
time.
下面是O(n)
及时使用HashSet的实现。
I have used a hashset to store unique values and 2 node-pointers to traverse through the linkedlist. If a duplicate is found, assign the value of current pointer to the previous pointer.
我使用了一个哈希集来存储唯一值和 2 个节点指针来遍历链表。如果发现重复,则将当前指针的值赋给前一个指针。
This will ensure removal of duplicate records.
这将确保删除重复的记录。
/// <summary>
/// Write code to remove duplicates from an unsorted linked list.
/// </summary>
class RemoveDups<T>
{
private class Node
{
public Node Next;
public T Data;
public Node(T value)
{
this.Data = value;
}
}
private Node head = null;
public static void MainMethod()
{
RemoveDups<int> rd = new RemoveDups<int>();
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(10);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(35);
rd.PrintNodes();
rd.RemoveDuplicates();
Console.WriteLine("Duplicates Removed!");
rd.PrintNodes();
}
private void RemoveDuplicates()
{
//use a hashtable to remove duplicates
HashSet<T> hs = new HashSet<T>();
Node current = head;
Node prev = null;
//loop through the linked list
while (current != null)
{
if (hs.Contains(current.Data))
{
//remove the duplicate record
prev.Next = current.Next;
}
else
{
//insert element into hashset
hs.Add(current.Data);
prev = current;
}
current = current.Next;
}
}
/// <summary>
/// Add Node at the beginning
/// </summary>
/// <param name="val"></param>
public void AddNode(T val)
{
Node newNode = new Node(val);
newNode.Data = val;
newNode.Next = head;
head = newNode;
}
/// <summary>
/// Print nodes
/// </summary>
public void PrintNodes()
{
Node current = head;
while (current != null)
{
Console.WriteLine(current.Data);
current = current.Next;
}
}
}