list 在没有 elem 的情况下从 Haskell 中的列表中删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16108714/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:59:37  来源:igfitidea点击:

Removing duplicates from a list in Haskell without elem

listhaskellrecursion

提问by BradStevenson

I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:

我正在尝试定义一个函数,该函数将从列表中删除重复项。到目前为止,我有一个有效的实现:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs)   | x `elem` xs   = rmdups xs
                | otherwise     = x : rmdups xs

However I'd like to rework this without using elem. What would be the best method for this?

但是我想在不使用elem. 什么是最好的方法?

I'd like to do this using my own function and not nubor nubBy.

我想使用我自己的函数而不是nubor来做到这一点nubBy

采纳答案by Benjamin Hodgson

I don't think you'll be able to do it without elem(or your own re-implementation of it).

我认为如果没有elem(或您自己重新实现它),您将无法做到这一点。

However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the lastone. Personally, I'd expect it to keep the first duplicate item and drop the rest.

但是,您的实现存在语义问题。当元素被复制时,你会保留最后一个。就个人而言,我希望它保留第一个重复项并丢弃其余项。

*Main> rmdups "abacd"
"bacd"

The solution is to thread the 'seen' elements through as a state variable.

解决方案是将“可见”元素作为状态变量穿过。

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
    where rdHelper seen [] = seen
          rdHelper seen (x:xs)
              | x `elem` seen = rdHelper seen xs
              | otherwise = rdHelper (seen ++ [x]) xs

This is more-or-less how nubis implemented in the standard library (read the source here). The small difference in nub's implementation ensures that it is non-strict, while removeDuplicatesabove is strict (it consumes the entire list before returning).

这或多或少nub是如何在标准库中实现的(阅读此处的源代码)。的实现中的微小差异nub确保它是非严格的,而removeDuplicates上面是严格的(它在返回之前消耗了整个列表)。

Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicatescan be implemented in one line with foldl:

如果您不担心严格性,原始递归在这里实际上是多余的。removeDuplicates可以在一行中实现foldl

removeDuplicates2 = foldl (\seen x -> if x `elem` seen
                                      then seen
                                      else seen ++ [x]) []

回答by scvalex

Both your code and nubhave O(N^2)complexity.

您的代码和nub都具有O(N^2)复杂性。

You can improve the complexity to O(N log N)and avoid using elemby sorting, grouping, and taking only the first element of each group.

您可以通过排序、分组和只取每组的第一个元素来提高复杂性O(N log N)并避免使用elem

Conceptually,

从概念上讲,

rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort

Suppose you start with the list [1, 2, 1, 3, 2, 4]. By sorting it, you get, [1, 1, 2, 2, 3, 4]; by grouping that, you get, [[1, 1], [2, 2], [3], [4]]; finally, by taking the head of each list, you get [1, 2, 3, 4].

假设您从列表开始[1, 2, 1, 3, 2, 4]。通过对它进行排序,你会得到,[1, 1, 2, 2, 3, 4];通过分组,你得到,[[1, 1], [2, 2], [3], [4]];最后,通过获取每个列表的头部,你得到[1, 2, 3, 4].

The full implementation of the above just involves expanding each function.

上面的完整实现只涉及扩展每个功能。

Note that this requires the stronger Ordconstraint on the elements of the list, and also changes their order in the returned list.

请注意,这需要对Ord列表元素进行更强的约束,并且还会更改它们在返回列表中的顺序。

回答by The Internet

Even easier.

更容易。

import Data.Set 
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList

Convert the set to a list of elements in O(n)time:

O(n)时间内将集合转换为元素列表:

toList :: Set a -> [a]
toList :: Set a -> [a]

Create a set from a list of elements in O(n log n)time:

O(n log n)时间内从元素列表创建一个集合:

fromList :: Ord a => [a] -> Set a
fromList :: Ord a => [a] -> Set a

In python it would be no different.

在 python 中也没有什么不同。

def mkUniq(x): 
   return list(set(x)))

回答by Nikita Volkov

Same as @scvalex's solution the following has an O(n * log n)complexity and an Orddependency. In difference to it, it preserves the order, keeping the first occurences of items.

与@scvalex 的解决方案相同,以下内容具有O(n * log n)复杂性和Ord依赖性。与它不同的是,它保留顺序,保留项目的第一次出现。

import qualified Data.Set as Set

rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
  rmdups' _ [] = []
  rmdups' a (b : c) = if Set.member b a
    then rmdups' a c
    else b : rmdups' (Set.insert b a) c

Benchmark results

基准测试结果

benchmark results

基准结果

As you can see, the benchmark results prove this solution to be the most effective. You can find the source of this benchmark here.

如您所见,基准测试结果证明该解决方案是最有效的。您可以在此处找到此基准测试的来源。

回答by Benjamin Hodgson

Using recursion-schemes:

使用递归方案

import Data.Functor.Foldable

dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
    where pseudoalgebra Nil                 = []
          pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs

While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.

虽然这当然更高级,但我认为它非常优雅,并展示了一些有价值的函数式编程范例。

回答by Muhammed Hasan Celik

It is too late to answer this question but I want to share my solution which is original without using elemand don't assume Ord.

现在回答这个问题为时已晚,但我想分享我的解决方案,该解决方案是原始的,不使用elem也不假设Ord.

rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k  | k <- rmdups'(xs), k /=x ]

This solution removes duplicates in the end of input, while question implementation deletes in the beginning. For example,

该解决方案在输入末尾删除重复项,而问题实现则在开头删除。例如,

rmdups "maximum-minimum"
-- "ax-nium"

rmdups' "maximum-minimum"
-- ""maxiu-n"

Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string. N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string.

此外,此代码复杂度为 O(N*K),其中 N 是字符串的长度,K 是字符串中唯一字符的数量。N >= K 因此,在最坏的情况下它将是 O(N^2) 但这意味着字符串中没有重复,这与您尝试删除字符串中的重复项不同。

回答by mrkanet

You can use this compress function too.

您也可以使用此压缩功能。

cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as) 
    |length as == 1 = as
    |a == (head as) = cmprs as
    |otherwise = [a]++cmprs as

回答by Lokesh Mohanty

Using dropWhile also works, but remember to sort the list before using this

使用 dropWhile 也可以,但请记住在使用之前对列表进行排序

rmdups :: (Eq a) => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : (rmdups $ dropWhile (\y -> y == x) xs)

回答by fp_mora

Graham Huttonhas a rmdupsfunction on p. 86 of Programming in Haskell. It preserves order. It is as follows.

Graham Huttonrmdups在 p 上有一个函数。86 的Haskell 编程。它保持秩序。如下。

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"

"maxiu-n"

"maxiu-n"

This was bothering me until I saw Hutton's function. Then, I tried, again. There are two versions, The first keeps the last duplicate, the second keeps the first.

这一直困扰着我,直到我看到 Hutton 的功能。然后,我再次尝试。有两个版本,第一个保留最后一个副本,第二个保留第一个。

rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"

"maxiu-n"

"maxiu-n"

If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change taketo dropin the function and change the enumeration zip [0..]to zip [1..].

如果你想采取的第一个,而不是列表的最后一个重复的元素,因为你正在尝试做的,只是改变takedrop在功能和改变枚举zip [0..]zip [1..]

回答by Fabio

...or by using the function union from Data.List applied to itself:

...或使用来自 Data.List 的函数 union 应用于自身:

import Data.List

unique x = union x x