list 有效地检查(大)列表的所有元素是否相同
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6121256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
efficiently checking that all the elements of a (big) list are the same
提问by MarcoS
Problem
问题
Let us suppose that we have a list xs
(possibly a very big one), and we want to check that all its elements are the same.
让我们假设我们有一个列表xs
(可能是一个非常大的列表),我们想检查它的所有元素是否相同。
I came up with various ideas:
我想出了各种想法:
Solution 0
解决方案 0
checking that all elements in tail xs
are equal to head xs
:
检查中的所有元素tail xs
是否等于head xs
:
allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)
Solution 1
解决方案1
checking that length xs
is equal to the length of the list obtained by taking elements from xs
while they're equal to head xs
检查length xs
是否等于通过获取元素xs
而获得的列表的长度,而它们等于head xs
allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)
Solution 2
解决方案2
recursive solution: allTheSame
returns True
if the first two elements of xs
are equal and allTheSame
returns True
on the rest of xs
递归溶液:allTheSame
返回True
如果前两个元素xs
是相等的,并allTheSame
返回True
上的其余部分xs
allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
| n == 0 = False
| n == 1 = True
| n == 2 = xs !! 0 == xs !! 1
| otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
where n = length xs
Solution 3
解决方案3
divide and conquer:
分而治之:
allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
| n == 0 = False
| n == 1 = True
| n == 2 = xs !! 0 == xs !! 1
| n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
| otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
where n = length xs
split = splitAt (n `div` 2) xs
Solution 4
解决方案4
I just thought about this while writing this question:
在写这个问题的时候我只是想过这个:
allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)
Questions
问题
I think Solution 0 is not very efficient, at least in terms of memory, because
map
will construct another list before applyingand
to its elements. Am I right?Solution 1 is still not very efficient, at least in terms of memory, because
takeWhile
will again build an additional list. Am I right?Solution 2 is tail recursive (right?), and it should be pretty efficient, because it will return
False
as soon as(xs !! 0 == xs !! 1)
is False. Am I right?Solution 3 should be the best one, because it complexity should be O(log n)
Solution 4 looks quite Haskellish to me (is it?), but it's probably the same as Solution 0, because
all p = and . map p
(from Prelude.hs). Am I right?Are there other better ways of writing
allTheSame
? Now, I expect someone will answer this question telling me that there's a build-in function that does this: I've searched with hoogle and I haven't found it. Anyway, since I'm learning Haskell, I believe that this was a good exercise for me :)
我认为解决方案 0 不是很有效,至少在内存方面,因为
map
会在应用and
到其元素之前构造另一个列表。我对吗?解决方案 1 仍然不是很有效,至少在内存方面,因为
takeWhile
将再次构建一个额外的列表。我对吗?解决方案 2 是尾递归的(对吗?),它应该非常有效,因为它会
False
在(xs !! 0 == xs !! 1)
为 False 时立即返回。我对吗?解决方案 3 应该是最好的一个,因为它的复杂度应该是 O(log n)
解决方案 4 对我来说看起来很 Haskellish(是吗?),但它可能与解决方案 0 相同,因为
all p = and . map p
(来自 Prelude.hs)。我对吗?还有其他更好的写作方式
allTheSame
吗?现在,我希望有人会回答这个问题,告诉我有一个内置函数可以做到这一点:我用 hoogle 搜索过,但没有找到。无论如何,因为我正在学习 Haskell,所以我相信这对我来说是一个很好的练习 :)
Any other comment is welcome. Thank you!
欢迎任何其他评论。谢谢!
回答by luqui
gatoatigrado's answer gives some nice advice for measuring the performance of various solutions. Here is a more symbolic answer.
gatoatigrado 的回答为衡量各种解决方案的性能提供了一些很好的建议。这是一个更具象征意义的答案。
I think solution 0 (or, exactly equivalently, solution 4) will be the fastest. Remember that Haskell is lazy, so map
will not have to construct the whole list before and
is applied. A good way to build intuition about this is to play with infinity. So for example:
我认为解决方案 0(或完全等效的解决方案 4)将是最快的。请记住,Haskell 是惰性的,因此map
在and
应用之前不必构建整个列表。对此建立直觉的一个好方法是玩无穷大。例如:
ghci> and $ map (< 1000) [1..]
False
This asks whether all numbers are less than 1,000. If map
constructed the entire list before and
were applied, then this question could never be answered. The expression will still answer quickly even if you give the list a very large right endpoint (that is, Haskell is not doing any "magic" depending on whether a list is infinite).
这询问是否所有数字都小于 1,000。如果map
在and
应用之前构建整个列表,那么这个问题将永远无法回答。即使你给列表一个非常大的右端点(也就是说,根据列表是否无限,Haskell 没有做任何“魔术”),表达式仍然会很快回答。
To start my example, let's use these definitions:
要开始我的示例,让我们使用以下定义:
and [] = True
and (x:xs) = x && and xs
map f [] = []
map f (x:xs) = f x : map f xs
True && x = x
False && x = False
Here is the evaluation order for allTheSame [7,7,7,7,8,7,7,7]
. There will be extra sharing that is too much of a pain to write down. I will also evaluate the head
expression earlier than it would be for conciseness (it would have been evaluated anyway, so it's hardly different).
这是 的评估顺序allTheSame [7,7,7,7,8,7,7,7]
。会有额外的分享,写下来太痛苦了。head
为了简洁起见,我还将比它更早地评估表达式(无论如何它都会被评估,所以它几乎没有什么不同)。
allTheSame [7,7,7,7,8,7,7,7]
allTheSame (7:7:7:7:8:7:7:7:[])
and $ map (== head (7:7:7:7:8:7:7:7:[])) (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7) (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7) (7:7:7:8:7:7:7:[])
and $ (== 7) 7 : map (== 7) (7:7:8:7:7:7:[])
(== 7) 7 && and (map (== 7) (7:7:8:7:7:7:[]))
True && and (map (== 7) (7:7:8:7:7:7:[]))
and (map (== 7) (7:7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7) (7:8:7:7:7:[]))
True && and (map (== 7) (7:8:7:7:7:[]))
and (map (== 7) (7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7) (8:7:7:7:[]))
True && and (map (== 7) (8:7:7:7:[]))
and (map (== 7) (8:7:7:7:[]))
(== 7) 8 && and (map (== 7) (7:7:7:[]))
False && and (map (== 7) (7:7:7:[]))
False
See how we didn't even have to check the last 3 7's? This is lazy evaluation making a list work more like a loop. All your other solutions use expensive functions like length
(which have to walk all the way to the end of the list to give an answer), so they will be less efficient and also they will not work on infinite lists. Working on infinite lists and being efficient often go together in Haskell.
看看我们是如何甚至不必检查最后 3 个 7 的?这是一种懒惰的评估,使列表更像一个循环。所有其他解决方案都使用昂贵的函数,例如length
(必须一直走到列表的末尾才能给出答案),因此它们的效率会降低,并且它们也不会在无限列表上工作。在 Haskell 中,处理无限列表和提高效率通常是相辅相成的。
回答by John L
First of all, I don't think you want to be working with lists. A lot of your algorithms rely upon calculating the length, which is bad. You may want to consider the vectorpackage, which will give you O(1) length compared to O(n) for a list. Vectors are also much more memory efficient, particularly if you can use Unboxed or Storable variants.
首先,我认为您不想使用列表。你的很多算法都依赖于计算长度,这很糟糕。您可能需要考虑vector包,与列表的 O(n) 相比,它会给您 O(1) 长度。Vectors 的内存效率也更高,特别是如果您可以使用 Unboxed 或 Storable 变体。
That being said, you really need to consider traversals and usage patterns in your code. Haskell's lists are very efficient if they can be generated on demand and consumed once. This means that you shouldn't hold on to references to a list. Something like this:
话虽如此,您确实需要考虑代码中的遍历和使用模式。如果 Haskell 的列表可以按需生成并使用一次,那么它们是非常有效的。这意味着您不应该保留对列表的引用。像这样的东西:
average xs = sum xs / length xs
requires that the entire list be retained in memory (by either sum
or length
) until both traversals are completed. If you can do your list traversal in one step, it'll be much more efficient.
要求将整个列表保留在内存中(通过sum
或length
),直到完成两次遍历。如果你可以一步完成你的列表遍历,它会更有效率。
Of course, you may need to retain the list anyway, such as to check if all the elements are equal, and if they aren't, do something else with the data. In this case, with lists of any size you're probably better off with a more compact data structure (e.g. vector).
当然,您可能无论如何都需要保留列表,例如检查所有元素是否相等,如果不相等,则对数据执行其他操作。在这种情况下,对于任何大小的列表,使用更紧凑的数据结构(例如向量)可能会更好。
Now that this is out of they way, here's a look at each of these functions. Where I show core, it was generated with ghc-7.0.3 -O -ddump-simpl
. Also, don't bother judging Haskell code performance when compiled with -O0. Compile it with the flags you would actually use for production code, typically at least -O and maybe other options too.
既然已经解决了这些问题,我们来看看这些功能中的每一个。在我展示核心的地方,它是用ghc-7.0.3 -O -ddump-simpl
. 另外,不要在使用 -O0 编译时费心判断 Haskell 代码的性能。使用您实际用于生产代码的标志编译它,通常至少是 -O 和其他选项。
Solution 0
解决方案 0
allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)
GHC produces this Core:
GHC 产生这个核心:
Test.allTheSame
:: forall a_abG. GHC.Classes.Eq a_abG => [a_abG] -> GHC.Bool.Bool
[GblId,
Arity=2,
Str=DmdType LS,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [3 3] 16 0}]
Test.allTheSame =
\ (@ a_awM)
($dEq_awN :: GHC.Classes.Eq a_awM)
(xs_abH :: [a_awM]) ->
case xs_abH of _ {
[] ->
GHC.List.tail1
`cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
:: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
: ds1_axJ xs1_axK ->
letrec {
go_sDv [Occ=LoopBreaker] :: [a_awM] -> GHC.Bool.Bool
[LclId, Arity=1, Str=DmdType S]
go_sDv =
\ (ds_azk :: [a_awM]) ->
case ds_azk of _ {
[] -> GHC.Bool.True;
: y_azp ys_azq ->
case GHC.Classes.== @ a_awM $dEq_awN y_azp ds1_axJ of _ {
GHC.Bool.False -> GHC.Bool.False; GHC.Bool.True -> go_sDv ys_azq
}
}; } in
go_sDv xs1_axK
}
This looks pretty good, actually. It will produce an error with an empty list, but that's easily fixed. This is the case xs_abH of _ { [] ->
. After this GHC performed a worker/wrapper transformation, the recursive worker function is the letrec { go_sDv
binding. The worker examines its argument. If []
, it's reached the end of the list and returns True. Otherwise it compares the head of the remaining to the first element and either returns False or checks the rest of the list.
这看起来很不错,实际上。它会产生一个空列表的错误,但这很容易修复。这是case xs_abH of _ { [] ->
. 在这个 GHC 执行了一个 worker/wrapper 转换之后,递归的 worker 函数就是letrec { go_sDv
绑定。工人检查其论点。如果[]
,则到达列表末尾并返回 True。否则,它将剩余的头部与第一个元素进行比较,并返回 False 或检查列表的其余部分。
Three other features.
其他三个特点。
- The
map
was entirely fused away and doesn't allocate a temporary list. - Near the top of the definition
notice the
Cheap=True
statement. This means GHC considers the function "cheap", and thus a candidate for inlining. At a call site, if a concrete argument type can be determined, GHC will probably inlineallTheSame
and produce a very tight inner loop, completely bypassing theEq
dictionary lookup. - The worker function is tail-recursive.
- 将
map
被完全融合的路程,不分配的临时列表。 - 靠近定义顶部的注意
Cheap=True
语句。这意味着 GHC 认为函数“便宜”,因此是内联的候选者。在调用站点,如果可以确定具体的参数类型,GHC 可能会内联allTheSame
并产生一个非常紧密的内部循环,完全绕过Eq
字典查找。 - 工作函数是尾递归的。
Verdict: Very strong contender.
结论:非常有力的竞争者。
Solution 1
解决方案1
allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)
Even without looking at core I know this won't be as good. The list is traversed more than once, first by length xs
then by length $ takeWhile
. Not only do you have the extra overhead of multiple traversals, it means that the list must be retained in memory after the first traversal and can't be GC'd. For a big list, this is a serious problem.
即使不看核心,我也知道这不会那么好。该列表被多次遍历,首先由length xs
then遍历length $ takeWhile
。你不仅有多次遍历的额外开销,这意味着列表必须在第一次遍历后保留在内存中,不能被 GC 处理。对于一个大列表,这是一个严重的问题。
Test.allTheSame'
:: forall a_abF. GHC.Classes.Eq a_abF => [a_abF] -> GHC.Bool.Bool
[GblId,
Arity=2,
Str=DmdType LS,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [3 3] 20 0}]
Test.allTheSame' =
\ (@ a_awF)
($dEq_awG :: GHC.Classes.Eq a_awF)
(xs_abI :: [a_awF]) ->
case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
case GHC.List.$wlen
@ a_awF
(GHC.List.takeWhile
@ a_awF
(let {
ds_sDq :: a_awF
[LclId, Str=DmdType]
ds_sDq =
case xs_abI of _ {
[] -> GHC.List.badHead @ a_awF; : x_axk ds1_axl -> x_axk
} } in
\ (ds1_dxa :: a_awF) ->
GHC.Classes.== @ a_awF $dEq_awG ds1_dxa ds_sDq)
xs_abI)
0
of ww1_XCn { __DEFAULT ->
GHC.Prim.==# ww_aC6 ww1_XCn
}
}
Looking at the core doesn't tell much beyond that. However, note these lines:
看看核心并不能说明更多。但是,请注意以下几行:
case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
case GHC.List.$wlen
This is where the list traversals happen. The first gets the length of the outer list and binds it to ww_aC6
. The second gets the length of the inner list, but the binding doesn't happen until near the bottom, at
这就是列表遍历发生的地方。第一个获取外部列表的长度并将其绑定到ww_aC6
。第二个获取内部列表的长度,但直到底部附近才发生绑定,在
of ww1_XCn { __DEFAULT ->
GHC.Prim.==# ww_aC6 ww1_XCn
The lengths (both Int
s) can be unboxed and compared by a primop, but that's a small consolation after the overhead that's been introduced.
长度(都是Int
s)可以拆箱并由 primop 进行比较,但在引入开销后,这是一个小小的安慰。
Verdict: Not good.
结论:不好。
Solution 2
解决方案2
allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
| n == 0 = False
| n == 1 = True
| n == 2 = xs !! 0 == xs !! 1
| otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
where n = length xs
This has the same problem as solution 1. The list is traversed multiple times, and it can't be GC'd. It's worse here though, because now the length is calculated for each sub-list. I'd expect this to have the worst performance of all on lists of any significant size. Also, why are you special-casing lists of 1 and 2 elements when you're expecting the list to be big?
这与解决方案1存在相同的问题。列表多次遍历,并且无法进行GC。不过这里更糟,因为现在为每个子列表计算长度。我希望这在任何重要规模的列表中都有最差的表现。另外,当您期望列表很大时,为什么要对 1 和 2 元素的特殊大小写列表?
Verdict: Don't even think about it.
结论:别想了。
Solution 3
解决方案3
allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
| n == 0 = False
| n == 1 = True
| n == 2 = xs !! 0 == xs !! 1
| n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
| otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
where n = length xs
split = splitAt (n `div` 2) xs
This has the same problem as Solution 2. Namely, the list is traversed multiple times by length
. I'm not certain a divide-and-conquer approach is a good choice for this problem, it could end up taking longer than a simple scan. It would depend on the data though, and be worth testing.
这与解决方案 2 存在相同的问题。即,列表被 多次遍历length
。我不确定分而治之的方法是解决这个问题的好选择,它最终可能比简单的扫描花费更长的时间。不过,这将取决于数据,值得测试。
Verdict: Maybe, if you used a different data structure.
结论:也许,如果您使用不同的数据结构。
Solution 4
解决方案4
allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)
This was basically my first thought. Let's check the core again.
这基本上是我的第一个想法。让我们再次检查核心。
Test.allTheSame''''
:: forall a_abC. GHC.Classes.Eq a_abC => [a_abC] -> GHC.Bool.Bool
[GblId,
Arity=2,
Str=DmdType LS,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, Cheap=True, Expandable=True,
Guidance=IF_ARGS [3 3] 10 0}]
Test.allTheSame'''' =
\ (@ a_am5)
($dEq_am6 :: GHC.Classes.Eq a_am5)
(xs_alK :: [a_am5]) ->
case xs_alK of _ {
[] ->
GHC.List.tail1
`cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
:: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
: ds1_axJ xs1_axK ->
GHC.List.all
@ a_am5
(\ (ds_dwU :: a_am5) ->
GHC.Classes.== @ a_am5 $dEq_am6 ds_dwU ds1_axJ)
xs1_axK
}
Ok, not too bad. Like solution 1, this will error on empty lists. The list traversal is hidden in GHC.List.all
, but it will probably be expanded to good code at a call site.
好吧,还不错。与解决方案 1 一样,这将在空列表上出错。列表遍历隐藏在 中GHC.List.all
,但它可能会在调用站点扩展为好的代码。
Verdict: Another strong contender.
结论:另一个强有力的竞争者。
So between all of these, with lists I'd expect that Solutions 0 and 4 are the only ones worth using, and they are pretty much the same. I might consider Option 3 in some cases.
因此,在所有这些列表中,我希望解决方案 0 和 4 是唯一值得使用的,而且它们几乎相同。在某些情况下,我可能会考虑选项 3。
Edit: in both cases, the errors on empty lists can be simply fixed as in @augustss's answer.
编辑:在这两种情况下,空列表上的错误都可以像@augustss 的回答一样简单地修复。
The next step would be to do some time profiling with criterion.
下一步是使用标准进行一些时间分析。
回答by tokland
A solution using consecutive pairs:
使用连续对的解决方案:
allTheSame xs = and $ zipWith (==) xs (tail xs)
回答by gatoatigrado
Q1 -- Yeah, I think your simple solution is fine, there is no memory leak. Q4 -- Solution 3 is not log(n), via the very simple argument that you need to look at all list elements to determine whether they are the same, and looking at 1 element takes 1 time step. Q5 -- yes. Q6, see below.
Q1 - 是的,我认为您的简单解决方案很好,没有内存泄漏。Q4 -- 解决方案 3 不是 log(n),通过一个非常简单的参数,您需要查看所有列表元素以确定它们是否相同,并且查看 1 个元素需要 1 个时间步长。Q5——是的。Q6,见下文。
The way to go about this is to type it in and run it
解决这个问题的方法是输入并运行它
main = do
print $ allTheSame (replicate 100000000 1)
then run ghc -O3 -optc-O3 --make Main.hs && time ./Main
. I like the last solution best (you can also use pattern matching to clean it up a little),
然后运行ghc -O3 -optc-O3 --make Main.hs && time ./Main
。我最喜欢最后一个解决方案(你也可以使用模式匹配来清理一下),
allTheSame (x:xs) = all (==x) xs
Open up ghci and run ":step fcn" on these things. It will teach you a lot about what lazy evaluation is expanding. In general, when you match a constructor, e.g. "x:xs", that's constant time. When you call "length", Haskell needs to compute all of the elements in the list (though their values are still "to-be-computed"), so solution 1 and 2 are bad.
打开 ghci 并在这些东西上运行 ":step fcn"。它会教你很多关于懒惰评估正在扩展的内容。通常,当您匹配构造函数时,例如“x:xs”,这是常数时间。当您调用“长度”时,Haskell 需要计算列表中的所有元素(尽管它们的值仍然是“待计算的”),因此解决方案 1 和 2 是错误的。
edit 1
编辑 1
Sorry if my previous answer was a bit shallow. It seems like expanding things manually does help a little (though compared to the other options, it's a trivial improvement),
对不起,如果我之前的回答有点肤浅。手动扩展似乎有点帮助(尽管与其他选项相比,这是一个微不足道的改进),
{-# LANGUAGE BangPatterns #-}
allTheSame [] = True
allTheSame ((!x):xs) = go x xs where
go !x [] = True
go !x (!y:ys) = (x == y) && (go x ys)
It seems that ghc is specializing the function already, but you can look at the specialize pragma too, in case it doesn't work for your code [ link].
似乎 ghc 已经专门化了该函数,但您也可以查看 specialize pragma,以防它不适用于您的代码 [链接]。
回答by Ankur
Here is another version (don't need to traverse whole list in case something doesn't match):
这是另一个版本(不需要遍历整个列表,以防某些内容不匹配):
allTheSame [] = True
allTheSame (x:xs) = isNothing $ find (x /= ) xs
This may not be syntactically correct , but I hope you got the idea.
这在语法上可能不正确,但我希望您明白这一点。
回答by dfeuer
Here's another fun way:
这是另一种有趣的方式:
{-# INLINABLE allSame #-}
allSame :: Eq a => [a] -> Bool
allSame xs = foldr go (`seq` True) xs Nothing where
go x r Nothing = r (Just x)
go x r (Just prev) = x == prev && r (Just x)
By keeping track of the previous element, rather than the first one, this implementation can easily be changed to implement increasing
or decreasing
. To check all of them against the first instead, you could rename prev
to first
, and replace Just x
with Just first
.
通过跟踪前一个元素而不是第一个元素,可以轻松地将此实现更改为实现increasing
或decreasing
。要根据第一个检查所有这些,您可以重命名prev
为first
,并替换Just x
为Just first
。
How will this be optimized? I haven't checked in detail, but I'm going to tell a good story based on some things I know about GHC's optimizations.
这将如何优化?我没有详细检查,但我将根据我对 GHC 优化的一些了解来讲述一个好故事。
Suppose first that list fusion does not occur. Then foldr
will be inlined, giving something like
首先假设列表融合没有发生。然后foldr
将被内联,给出类似的东西
allSame xs = allSame' xs Nothing where
allSame' [] = (`seq` True)
allSame' (x : xs) = go x (allSame' xs)
Eta expansion then yields
Eta 扩展然后产生
allSame' [] acc = acc `seq` True
allSame' (x : xs) acc = go x (allSame' xs) acc
Inlining go
,
内联go
,
allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame' xs (Just x)
allSame' (x : xs) (Just prev) =
x == prev && allSame' xs (Just x)
Now GHC can recognize that the Maybe
value is always Just
on the recursive call, and use a worker-wrapper transformation to take advantage of this:
现在 GHC 可以识别该Maybe
值始终Just
在递归调用中,并使用 worker-wrapper 转换来利用这一点:
allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame'' xs x
allSame' (x : xs) (Just prev) = x == prev && allSame'' xs x
allSame'' [] prev = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x
Remember now that
现在记住
allSame xs = allSame' xs Nothing
and allSame'
is no longer recursive, so it can be beta-reduced:
并且allSame'
不再是递归的,所以它可以被 Beta 减少:
allSame [] = True
allSame (x : xs) = allSame'' xs x
allSame'' [] _ = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x
So the higher-order code has turned into efficient recursive code with no extra allocation.
因此,高阶代码变成了无需额外分配的高效递归代码。
Compiling the module defining allSame
using -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures
yields the following (I've cleaned it up a bit):
编译定义allSame
using的模块会-O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures
产生以下结果(我已经清理了一点):
allSame :: forall a. Eq a => [a] -> Bool
allSame =
\ (@ a) ($dEq_a :: Eq a) (xs0 :: [a]) ->
let {
equal :: a -> a -> Bool
equal = == $dEq_a } in
letrec {
go :: [a] -> a -> Bool
go =
\ (xs :: [a]) (prev :: a) ->
case xs of _ {
[] -> True;
: y ys ->
case equal y prev of _ {
False -> False;
True -> go ys y
}
}; } in
case xs0 of _ {
[] -> True;
: x xs -> go xs x
}
As you can see, this is essentially the same as the result I described. The equal = == $dEq_a
bit is where the equality method is extracted from the Eq
dictionary and saved in a variable so it only needs to be extracted once.
如您所见,这与我描述的结果基本相同。该equal = == $dEq_a
位是其中使用了等式方法从所提取的Eq
字典,并保存在一个变量,因此只需要萃取一次。
What if list fusion doesoccur? Here's a reminder of the definition:
如果确实发生列表融合怎么办?这里有一个定义的提醒:
allSame xs = foldr go (`seq` True) xs Nothing where
go x r Nothing = r (Just x)
go x r (Just prev) = x == prev && r (Just x)
If we call allSame (build g)
, the foldr
will fuse with the build
according to the rule foldr c n (build g) = g c n
, yielding
如果我们调用allSame (build g)
,foldr
将build
根据规则与 融合foldr c n (build g) = g c n
,屈服
allSame (build g) = g go (`seq` True) Nothing
That doesn't get us anywhere interesting unless g
is known. So let's choose something simple:
除非g
已知,否则这不会让我们变得有趣。所以让我们选择一些简单的东西:
replicate k0 a = build $ \c n ->
let
rep 0 = n
rep k = a `c` rep (k - 1)
in rep k0
So if h = allSame (replicate k0 a)
, h
becomes
所以如果h = allSame (replicate k0 a)
,h
变成
let
rep 0 = (`seq` True)
rep k = go a (rep (k - 1))
in rep k0 Nothing
Eta expanding,
Eta 扩展,
let
rep 0 acc = acc `seq` True
rep k acc = go a (rep (k - 1)) acc
in rep k0 Nothing
Inlining go
,
内联go
,
let
rep 0 acc = acc `seq` True
rep k Nothing = rep (k - 1) (Just a)
rep k (Just prev) = a == prev && rep (k - 1) (Just a)
in rep k0 Nothing
Again, GHC can see the recursive call is always Just
, so
同样,GHC 可以看到递归调用总是Just
,所以
let
rep 0 acc = acc `seq` True
rep k Nothing = rep' (k - 1) a
rep k (Just prev) = a == prev && rep' (k - 1) a
rep' 0 _ = True
rep' k prev = a == prev && rep' (k - 1) a
in rep k0 Nothing
Since rep
is no longer recursive, GHC can reduce it:
由于rep
不再是递归的,GHC 可以减少它:
let
rep' 0 _ = True
rep' k prev = a == prev && rep' (k - 1) a
in
case k0 of
0 -> True
_ -> rep' (k - 1) a
As you can see, this can run with no allocation whatsoever! Obviously, it's a silly example, but something similar will happen in many more interesting cases. For example, if you write an AllSameTest
module importing the allSame
function and defining
如您所见,这可以在没有任何分配的情况下运行!显然,这是一个愚蠢的例子,但在许多更有趣的情况下也会发生类似的事情。例如,如果您编写一个AllSameTest
导入allSame
函数并定义的模块
foo :: Int -> Bool
foo n = allSame [0..n]
and compile it as described above, you'll get the following (not cleaned up).
并按上述方式编译它,您将得到以下内容(未清理)。
$wfoo :: Int# -> Bool
$wfoo =
\ (ww_s1bY :: Int#) ->
case tagToEnum# (># 0 ww_s1bY) of _ {
False ->
letrec {
$sgo_s1db :: Int# -> Int# -> Bool
$sgo_s1db =
\ (sc_s1d9 :: Int#) (sc1_s1da :: Int#) ->
case tagToEnum# (==# sc_s1d9 sc1_s1da) of _ {
False -> False;
True ->
case tagToEnum# (==# sc_s1d9 ww_s1bY) of _ {
False -> $sgo_s1db (+# sc_s1d9 1) sc_s1d9;
True -> True
}
}; } in
case ww_s1bY of _ {
__DEFAULT -> $sgo_s1db 1 0;
0 -> True
};
True -> True
}
foo :: Int -> Bool
foo =
\ (w_s1bV :: Int) ->
case w_s1bV of _ { I# ww1_s1bY -> $wfoo ww1_s1bY }
That may look disgusting, but you'll note that there are no :
constructors anywhere, and that the Int
s are all unboxed, so the function can run with zero allocation.
这可能看起来很恶心,但您会注意到:
任何地方都没有构造函数,并且Int
s 都是未装箱的,因此该函数可以在零分配的情况下运行。
回答by Alberto Garcia-Raboso
While not very efficient (it will traverse the whole list even if the first two elements don't match), here's a cheeky solution:
虽然效率不高(即使前两个元素不匹配,它也会遍历整个列表),但这是一个厚脸皮的解决方案:
import Data.List (group)
allTheSame :: (Eq a) => [a] -> Bool
allTheSame = (== 1) . length . group
Just for fun.
只是为了好玩。
回答by Gon?alo Faria
This implementation is superior.
这种实现是优越的。
allSame [ ] = True
allSame (h:t) = aux h t
aux x1 [ ] = True
aux x1 (x2:xs) | x1==x2 = aux x2 xs
| otherwise = False
Given the transitivity of the (==) operator, assuming the instance of Eq is well implemented if you wish to assure the equality of a chain of expressions, eg a = b = c = d, you will only need to assure that a=b, b=c, c=d, and that d=a, Instead of the provided techniques above, eg a=b, a=c, a=d, b=c , b=d, c=d.
鉴于 (==) 运算符的传递性,假设 Eq 的实例实现良好,如果您希望确保表达式链的相等性,例如 a = b = c = d,则只需确保 a= b,b=c,c=d,并且d=a,代替上面提供的技术,例如a=b,a=c,a=d,b=c,b=d,c=d。
The solution I proposed grows linearly with the number of elements you wish to test were's the latter is quadratic even if you introduce constant factors in hopes of improving its efficiency.
我提出的解决方案随着您希望测试的元素数量线性增长,即使您引入常数因子以希望提高其效率,后者也是二次的。
It's also superior to the solution using group since you don't have to use length in the end.
它也优于使用 group 的解决方案,因为您最终不必使用长度。
You can also write it nicely in pointwise fashion but I won't bore you with such trivial details.
你也可以以点式方式写得很好,但我不会用这些琐碎的细节让你厌烦。
回答by Jonas K?lker
I think I might just be implementing find
and redoing this. I think it's instructive, though, to see the innards of it. (Note how the solution depends on equality being transitive, though note also how the problemrequires equality to be transitive to be coherent.)
我想我可能只是在实施find
和重做这个。不过,我认为看看它的内部结构是有益的。(注意解决方案如何依赖于等式的传递性,但也要注意问题如何要求等式是传递性的才能保持一致。)
sameElement x:y:xs = if x /= y then Nothing else sameElement y:xs
sameElement [x] = Just x
allEqual [] = True
allEqual xs = isJust $ sameElement xs
I like how sameElement
peeks at the first O(1) elements of the list, then either returns a result or recurses on some suffix of the list, in particular the tail. I don't have anything smart to say about that structure, I just like it :-)
我喜欢sameElement
查看列表的前 O(1) 个元素,然后返回结果或递归列表的某些后缀,特别是尾部。我对这种结构没什么好说的,我只是喜欢它:-)
I think I do the same comparisons as this. If instead I had recursed with sameElement x:xs
, I would compare the head of the input list to each element like in solution 0.
我觉得我做同样的比较,因为这个。相反sameElement x:xs
,如果我使用 递归,我会将输入列表的头部与解决方案 0 中的每个元素进行比较。
Tangent: one could, if one wanted, report the two mismatching elements by replacing Nothing
with Left (x, y)
and Just x
with Right x
and isJust
with either (const False) (const True)
.
切:一个可能,如果一个人想,通过更换报告两个不匹配的元素Nothing
与Left (x, y)
和Just x
带Right x
和isJust
带either (const False) (const True)
。