.net 哪种方法性能更好:.Any() 与 .Count() > 0?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/305092/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 10:39:34  来源:igfitidea点击:

Which method performs better: .Any() vs .Count() > 0?

.netlinqperformance.net-3.5extension-methods

提问by Pure.Krome

in the System.Linqnamespace, we can now extend our IEnumerable'sto have the Any()and Count()extension methods.

System.Linq命名空间中,我们现在可以扩展我们的IEnumerable以拥有Any()Count()扩展方法

I was told recently that if i want to check that a collection contains 1 or more items inside it, I should use the .Any()extension method instead of the .Count() > 0extension method because the .Count()extension method has to iterate through all the items.

最近有人告诉我,如果我想检查一个集合中是否包含 1 个或多个项目,我应该使用.Any()扩展方法而不是.Count() > 0扩展方法,因为.Count()扩展方法必须遍历所有项目。

Secondly, some collections have a property(not an extension method) that is Countor Length. Would it be better to use those, instead of .Any()or .Count()?

其次,一些集合的属性(不是扩展方法)是Countor Length。使用它们而不是.Any()or会更好.Count()吗?

yea / nae ?

是/否?

回答by Marc Gravell

If you are starting with something that has a .Lengthor .Count(such as ICollection<T>, IList<T>, List<T>, etc) - then this will be the fastest option, since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose()sequence required by Any()to check for a non-empty IEnumerable<T>sequence.

如果你开始的东西,有一个.Length.Count(如ICollection<T>IList<T>List<T>,等) -那么这将是最快的选择,因为它不需要去通过GetEnumerator()/ MoveNext()/Dispose()所要求的顺序Any(),检查是否有非空IEnumerable<T>序列.

For just IEnumerable<T>, then Any()will generallybe quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count()does check for ICollection<T>(using .Countas an optimisation) - so if your underlying data-source is directlya list/collection, there won't be a huge difference. Don't ask me why it doesn't use the non-generic ICollection...

对于刚刚IEnumerable<T>,然后Any()通常更快,因为它只有看一次迭代。但是,请注意 LINQ-to-Objects 实现Count()确实会检查ICollection<T>.Count用作优化)-因此,如果您的基础数据源直接是列表/集合,则不会有太大差异。不要问我为什么不使用非通用ICollection...

Of course, if you have used LINQ to filter it etc (Whereetc), you will have an iterator-block based sequence, and so this ICollection<T>optimisation is useless.

当然,如果您使用 LINQ 对其进行过滤等(Where等),您将拥有一个基于迭代器块的序列,因此这种ICollection<T>优化是无用的。

In general with IEnumerable<T>: stick with Any();-p

一般来说IEnumerable<T>:坚持使用Any();-p

回答by nikib3ro

Note:I wrote this answer when Entity Framework 4 was actual. The point of this answer was not to get into trivial .Any()vs .Count()performance testing. The point was to signal that EF is far from perfect. Newer versions are better... but if you have part of code that's slow and it uses EF, test with direct TSQL and compare performance rather than relying on assumptions (that .Any()is ALWAYS faster than .Count() > 0).

注意:当实体框架 4 是实际的时,我写了这个答案。这个答案的重点不是进入琐碎的.Any().Count()性能测试。重点是表明EF远非完美。较新的版本更好......但如果您有部分代码很慢并且它使用 EF,请使用直接 TSQL 进行测试并比较性能,而不是依赖于假设(.Any()总是比 快.Count() > 0)。



While I agree with most up-voted answer and comments - especially on the point Anysignals developer intentbetter than Count() > 0- I've had situation in which Count is faster by order of magnitude on SQL Server (EntityFramework 4).

虽然我同意大多数投票的答案和评论 - 特别是在这一点上Any表明开发人员的意图更好Count() > 0- 我已经遇到了 Count 在 SQL Server (EntityFramework 4) 上的数量级更快的情况。

Here is query with Anythat thew timeout exception (on ~200.000 records):

这是带有Any超时异常的查询(约 200.000 条记录):

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && !a.NewsletterLogs.Any(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr)
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

Countversion executed in matter of milliseconds:

Count在几毫秒内执行的版本:

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && a.NewsletterLogs.Count(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) == 0
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

I need to find a way to see what exact SQL both LINQs produce - but it's obvious there is a huge performance difference between Countand Anyin some cases, and unfortunately it seems you can't just stick with Anyin all cases.

我需要找到一种方法来查看两个 LINQ 产生的确切 SQL - 但很明显CountAny在某些情况下和之间存在巨大的性能差异,不幸的是,您似乎不能Any在所有情况下都坚持使用。

EDIT: Here are generated SQLs. Beauties as you can see ;)

编辑:这里是生成的 SQL。美女如你所见;)

ANY:

ANY

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[ContactId] AS [ContactId], 
        [Extent1].[CompanyId] AS [CompanyId], 
        [Extent1].[ContactName] AS [ContactName], 
        [Extent1].[FullName] AS [FullName], 
        [Extent1].[ContactStatusId] AS [ContactStatusId], 
        [Extent1].[Created] AS [Created]
        FROM [dbo].[Contact] AS [Extent1]
        WHERE ([Extent1].[CompanyId] = @p__linq__0) AND ([Extent1].[ContactStatusId] <= 3) AND ( NOT EXISTS (SELECT 
            1 AS [C1]
            FROM [dbo].[NewsletterLog] AS [Extent2]
            WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])
        ))
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

COUNT:

COUNT

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Project1].[ContactId] AS [ContactId], 
        [Project1].[CompanyId] AS [CompanyId], 
        [Project1].[ContactName] AS [ContactName], 
        [Project1].[FullName] AS [FullName], 
        [Project1].[ContactStatusId] AS [ContactStatusId], 
        [Project1].[Created] AS [Created]
        FROM ( SELECT 
            [Extent1].[ContactId] AS [ContactId], 
            [Extent1].[CompanyId] AS [CompanyId], 
            [Extent1].[ContactName] AS [ContactName], 
            [Extent1].[FullName] AS [FullName], 
            [Extent1].[ContactStatusId] AS [ContactStatusId], 
            [Extent1].[Created] AS [Created], 
            (SELECT 
                COUNT(1) AS [A1]
                FROM [dbo].[NewsletterLog] AS [Extent2]
                WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])) AS [C1]
            FROM [dbo].[Contact] AS [Extent1]
        )  AS [Project1]
        WHERE ([Project1].[CompanyId] = @p__linq__0) AND ([Project1].[ContactStatusId] <= 3) AND (0 = [Project1].[C1])
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

Seems that pure Where with EXISTS works much worse than calculating Count and then doing Where with Count == 0.

似乎纯 Where with EXISTS 比计算 Count 然后用 Count == 0 做 Where 更糟糕。

Let me know if you guys see some error in my findings. What can be taken out of all this regardless of Any vs Count discussion is that any more complex LINQ is way better off when rewritten as Stored Procedure ;).

如果你们发现我的发现有错误,请告诉我。不管 Any vs Count 的讨论如何,从这一切中可以得出的是,当重写为存储过程时,任何更复杂的 LINQ 都会更好;)。

回答by kamil-mrzyglod

Since this is a rather popular topic and answers differ, I had to take a fresh look on the problem.

由于这是一个相当受欢迎的话题并且答案不同,我不得不重新审视这个问题。

Testing env:EF 6.1.3, SQL Server, 300k records

测试环境:EF 6.1.3、SQL Server、300k 记录

Table model:

表型号

class TestTable
{
    [Key]
    public int Id { get; set; }

    public string Name { get; set; }

    public string Surname { get; set; }
}

Test code:

测试代码:

class Program
{
    static void Main()
    {
        using (var context = new TestContext())
        {
            context.Database.Log = Console.WriteLine;

            context.TestTables.Where(x => x.Surname.Contains("Surname")).Any(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Any(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname")).Count(x => x.Id > 1000);
            context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Count(x => x.Id > 1000);

            Console.ReadLine();
        }
    }
}

Results:

结果:

Any() ~ 3ms

Any() ~ 3ms

Count() ~ 230ms for first query, ~ 400ms for second

Count() ~ 第一次查询需要 230 毫秒,第二次查询需要 ~ 400 毫秒

Remarks:

评论:

For my case, EF didn't generate SQL like @Ben mentioned in his post.

就我而言,EF 没有像 @Ben 在他的帖子中提到的那样生成 SQL。

回答by Ben

EDIT:it was fixed in EF version 6.1.1. and this answer is no more actual

编辑:它已在 EF 版本 6.1.1 中修复。这个答案不再实际

For SQL Server and EF4-6, Count() performs about two times faster than Any().

对于 SQL Server 和 EF4-6,Count() 的执行速度大约是 Any() 的两倍。

When you run Table.Any(), it will generate something like(alert: don't hurt the brain trying to understand it)

当你运行 Table.Any() 时,它会生成类似(警告:不要伤害试图理解它的大脑

SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent1]
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent2]
)) THEN cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]

that requires 2 scans of rows with your condition.

这需要根据您的条件对行进行 2 次扫描。

I don't like to write Count() > 0because it hides my intention. I prefer to use custom predicate for this:

我不喜欢写作,Count() > 0因为它隐藏了我的意图。我更喜欢为此使用自定义谓词:

public static class QueryExtensions
{
    public static bool Exists<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
    {
        return source.Count(predicate) > 0;
    }
}

回答by Timothy Gonzalez

It depends, how big is the data set and what are your performance requirements?

这取决于数据集有多大以及您的性能要求是什么?

If it's nothing gigantic use the most readable form, which for myself is any, because it's shorter and readable rather than an equation.

如果没什么大不了的,使用最易读的形式,对我来说是任何形式,因为它更短、更易读,而不是一个等式。

回答by Bronks

You can make a simple test to figure this out:

你可以做一个简单的测试来解决这个问题:

var query = //make any query here
var timeCount = new Stopwatch();
timeCount.Start();
if (query.Count > 0)
{
}
timeCount.Stop();
var testCount = timeCount.Elapsed;

var timeAny = new Stopwatch();
timeAny.Start();
if (query.Any())
{
}
timeAny.Stop();
var testAny = timeAny.Elapsed;

Check the values of testCount and testAny.

检查 testCount 和 testAny 的值。

回答by Thiago Coelho

About the Count()method, if the IEnumarableis an ICollection, then we can't iterate across all items because we can retrieve the Countfield of ICollection, if the IEnumerableis not an ICollectionwe must iterate across all items using a whilewith a MoveNext, take a look the .NET Framework Code:

关于计数()方法,如果IEnumarableICollection的,那么我们不能迭代的所有项目,因为我们可以检索计数的领域ICollection的,如果IEnumerable的是不是一个ICollection的,我们必须重复的所有项目使用,而用一个MoveNext,看看 .NET Framework 代码:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null) 
        throw Error.ArgumentNull("source");

    ICollection<TSource> collectionoft = source as ICollection<TSource>;
    if (collectionoft != null) 
        return collectionoft.Count;

    ICollection collection = source as ICollection;
    if (collection != null) 
        return collection.Count;

    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        checked
        {
            while (e.MoveNext()) count++;
        }
    }
    return count;
}

Reference: Reference Source Enumerable

参考:参考源可枚举

回答by Janmejay Kumar

If you are using the Entity Framework and have a huge table with many records Any()will be much faster. I remember one time I wanted to check to see if a table was empty and it had millions of rows. It took 20-30 seconds for Count() > 0 to complete. It was instant with Any().

如果您使用实体框架并且有一个包含许多记录的大表,则Any()会快得多。我记得有一次我想检查一个表是否为空并且它有数百万行。Count() > 0 需要 20-30 秒才能完成。Any()是即时的。

Any()can be a performance enhancement because it may not have to iterate the collection to get the number of things. It just has to hit one of them. Or, for, say, LINQ-to-Entities, the generated SQL will be IF EXISTS(...) rather than SELECT COUNT ... or even SELECT * ....

Any()可以提高性能,因为它可能不必迭代集合来获取事物的数量。它只需要击中其中一个。或者,对于 LINQ-to-Entities,生成的 SQL 将是 IF EXISTS(...) 而不是 SELECT COUNT ... 甚至 SELECT * ....