list 在没有 elem 的情况下从 Haskell 中的列表中删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16108714/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing duplicates from a list in Haskell without elem
提问by BradStevenson
I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:
我正在尝试定义一个函数,该函数将从列表中删除重复项。到目前为止,我有一个有效的实现:
rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) | x `elem` xs = rmdups xs
| otherwise = x : rmdups xs
However I'd like to rework this without using elem
. What would be the best method for this?
但是我想在不使用elem
. 什么是最好的方法?
I'd like to do this using my own function and not nub
or nubBy
.
我想使用我自己的函数而不是nub
or来做到这一点nubBy
。
采纳答案by Benjamin Hodgson
I don't think you'll be able to do it without elem
(or your own re-implementation of it).
我认为如果没有elem
(或您自己重新实现它),您将无法做到这一点。
However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the lastone. Personally, I'd expect it to keep the first duplicate item and drop the rest.
但是,您的实现存在语义问题。当元素被复制时,你会保留最后一个。就个人而言,我希望它保留第一个重复项并丢弃其余项。
*Main> rmdups "abacd"
"bacd"
The solution is to thread the 'seen' elements through as a state variable.
解决方案是将“可见”元素作为状态变量穿过。
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
where rdHelper seen [] = seen
rdHelper seen (x:xs)
| x `elem` seen = rdHelper seen xs
| otherwise = rdHelper (seen ++ [x]) xs
This is more-or-less how nub
is implemented in the standard library (read the source here). The small difference in nub
's implementation ensures that it is non-strict, while removeDuplicates
above is strict (it consumes the entire list before returning).
这或多或少nub
是如何在标准库中实现的(阅读此处的源代码)。的实现中的微小差异nub
确保它是非严格的,而removeDuplicates
上面是严格的(它在返回之前消耗了整个列表)。
Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates
can be implemented in one line with foldl
:
如果您不担心严格性,原始递归在这里实际上是多余的。removeDuplicates
可以在一行中实现foldl
:
removeDuplicates2 = foldl (\seen x -> if x `elem` seen
then seen
else seen ++ [x]) []
回答by scvalex
Both your code and nub
have O(N^2)
complexity.
您的代码和nub
都具有O(N^2)
复杂性。
You can improve the complexity to O(N log N)
and avoid using elem
by sorting, grouping, and taking only the first element of each group.
您可以通过排序、分组和只取每组的第一个元素来提高复杂性O(N log N)
并避免使用elem
。
Conceptually,
从概念上讲,
rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort
Suppose you start with the list [1, 2, 1, 3, 2, 4]
. By sorting it, you get, [1, 1, 2, 2, 3, 4]
; by grouping that, you get, [[1, 1], [2, 2], [3], [4]]
; finally, by taking the head of each list, you get [1, 2, 3, 4]
.
假设您从列表开始[1, 2, 1, 3, 2, 4]
。通过对它进行排序,你会得到,[1, 1, 2, 2, 3, 4]
;通过分组,你得到,[[1, 1], [2, 2], [3], [4]]
;最后,通过获取每个列表的头部,你得到[1, 2, 3, 4]
.
The full implementation of the above just involves expanding each function.
上面的完整实现只涉及扩展每个功能。
Note that this requires the stronger Ord
constraint on the elements of the list, and also changes their order in the returned list.
请注意,这需要对Ord
列表元素进行更强的约束,并且还会更改它们在返回列表中的顺序。
回答by The Internet
Even easier.
更容易。
import Data.Set
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList
Convert the set to a list of elements in O(n)time:
在O(n)时间内将集合转换为元素列表:
toList :: Set a -> [a]
toList :: Set a -> [a]
Create a set from a list of elements in O(n log n)time:
在O(n log n)时间内从元素列表创建一个集合:
fromList :: Ord a => [a] -> Set a
fromList :: Ord a => [a] -> Set a
In python it would be no different.
在 python 中也没有什么不同。
def mkUniq(x):
return list(set(x)))
回答by Nikita Volkov
Same as @scvalex's solution the following has an O(n * log n)
complexity and an Ord
dependency. In difference to it, it preserves the order, keeping the first occurences of items.
与@scvalex 的解决方案相同,以下内容具有O(n * log n)
复杂性和Ord
依赖性。与它不同的是,它保留顺序,保留项目的第一次出现。
import qualified Data.Set as Set
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
rmdups' _ [] = []
rmdups' a (b : c) = if Set.member b a
then rmdups' a c
else b : rmdups' (Set.insert b a) c
Benchmark results
基准测试结果
As you can see, the benchmark results prove this solution to be the most effective. You can find the source of this benchmark here.
如您所见,基准测试结果证明该解决方案是最有效的。您可以在此处找到此基准测试的来源。
回答by Benjamin Hodgson
Using recursion-schemes:
使用递归方案:
import Data.Functor.Foldable
dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
where pseudoalgebra Nil = []
pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs
While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.
虽然这当然更高级,但我认为它非常优雅,并展示了一些有价值的函数式编程范例。
回答by Muhammed Hasan Celik
It is too late to answer this question but I want to share my solution which is original without using elem
and don't assume Ord
.
现在回答这个问题为时已晚,但我想分享我的解决方案,该解决方案是原始的,不使用elem
也不假设Ord
.
rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k | k <- rmdups'(xs), k /=x ]
This solution removes duplicates in the end of input, while question implementation deletes in the beginning. For example,
该解决方案在输入末尾删除重复项,而问题实现则在开头删除。例如,
rmdups "maximum-minimum"
-- "ax-nium"
rmdups' "maximum-minimum"
-- ""maxiu-n"
Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string. N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string.
此外,此代码复杂度为 O(N*K),其中 N 是字符串的长度,K 是字符串中唯一字符的数量。N >= K 因此,在最坏的情况下它将是 O(N^2) 但这意味着字符串中没有重复,这与您尝试删除字符串中的重复项不同。
回答by mrkanet
You can use this compress function too.
您也可以使用此压缩功能。
cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as)
|length as == 1 = as
|a == (head as) = cmprs as
|otherwise = [a]++cmprs as
回答by Lokesh Mohanty
Using dropWhile also works, but remember to sort the list before using this
使用 dropWhile 也可以,但请记住在使用之前对列表进行排序
rmdups :: (Eq a) => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : (rmdups $ dropWhile (\y -> y == x) xs)
回答by fp_mora
Graham Huttonhas a rmdups
function on p. 86 of Programming in Haskell. It preserves order. It is as follows.
Graham Huttonrmdups
在 p 上有一个函数。86 的Haskell 编程。它保持秩序。如下。
rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"
"maxiu-n"
"maxiu-n"
This was bothering me until I saw Hutton's function. Then, I tried, again. There are two versions, The first keeps the last duplicate, the second keeps the first.
这一直困扰着我,直到我看到 Hutton 的功能。然后,我再次尝试。有两个版本,第一个保留最后一个副本,第二个保留第一个。
rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"
"maxiu-n"
"maxiu-n"
If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change take
to drop
in the function and change the enumeration zip [0..]
to zip [1..]
.
如果你想采取的第一个,而不是列表的最后一个重复的元素,因为你正在尝试做的,只是改变take
以drop
在功能和改变枚举zip [0..]
到zip [1..]
。
回答by Fabio
...or by using the function union from Data.List applied to itself:
...或使用来自 Data.List 的函数 union 应用于自身:
import Data.List
unique x = union x x