C# 调用 ToList() 时对性能有影响吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15516462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a performance impact when calling ToList()?
提问by Cody
When using ToList()
, is there a performance impact that needs to be considered?
使用时ToList()
,是否有需要考虑的性能影响?
I was writing a query to retrieve files from a directory, which is the query:
我正在编写一个查询来从目录中检索文件,这是查询:
string[] imageArray = Directory.GetFiles(directory);
string[] imageArray = Directory.GetFiles(directory);
However, since I like to work with List<>
instead, I decided to put in...
但是,因为我喜欢与之合作List<>
,所以我决定投入...
List<string> imageList = Directory.GetFiles(directory).ToList();
List<string> imageList = Directory.GetFiles(directory).ToList();
So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?
那么,在决定进行这样的转换时是否应该考虑某种性能影响 - 或者仅在处理大量文件时才考虑?这是一个可以忽略不计的转换吗?
采纳答案by Daniel Imms
IEnumerable.ToList()
IEnumerable.ToList()
Yes, IEnumerable<T>.ToList()
does have a performance impact, it is an O(n)operation though it will likely only require attention in performance critical operations.
是的,IEnumerable<T>.ToList()
确实有性能影响,它是一个O(n)操作,尽管它可能只需要在性能关键操作中引起注意。
The ToList()
operation will use the List(IEnumerable<T> collection)
constructor. This constructor must make a copy of the array (more generally IEnumerable<T>
), otherwise future modifications of the original array will change on the source T[]
also which wouldn't be desirable generally.
该ToList()
操作将使用List(IEnumerable<T> collection)
构造函数。这个构造函数必须制作一个数组的副本(更一般地IEnumerable<T>
),否则原始数组的未来修改也会在源上发生变化,T[]
这通常是不可取的。
I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.
我想重申,这只会对庞大的列表产生影响,复制内存块是一个非常快的操作。
Handy tip, As
vs To
方便的提示,As
vsTo
You'll notice in LINQ there are several methods that start with As
(such as AsEnumerable()
) and To
(such as ToList()
). The methods that start with To
require a conversion like above (ie. may impact performance), and the methods that start with As
do not and will just require some cast or simple operation.
您会注意到在 LINQ 中有几种以As
(例如AsEnumerable()
)和To
(例如ToList()
)开头的方法。以 开头的方法To
需要像上面这样的转换(即可能会影响性能),而以 开头的方法As
不需要并且只需要一些强制转换或简单的操作。
Additional details on List<T>
有关的其他详细信息 List<T>
Here is a little more detail on how List<T>
works in case you're interested :)
List<T>
如果您有兴趣,这里有更多关于如何工作的详细信息:)
A List<T>
also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.
AList<T>
还使用一种称为动态数组的构造,它需要按需调整大小,此调整大小事件将旧数组的内容复制到新数组。因此,它开始时很小,并在需要时增加大小。
This is the difference between the Capacity
and Count
attributes on List<T>
. Capacity
refers to the size of the array behind the scenes, Count
is the number of items in the List<T>
which is always <= Capacity
. So when an item is added to the list, increasing it past Capacity
, the size of the List<T>
is doubled and the array is copied.
这就是Capacity
和Count
属性之间的区别List<T>
。Capacity
指的是幕后数组的大小,Count
是其中的项数,List<T>
始终为<= Capacity
。所以当一个项目被添加到列表中时,将它增加到Capacity
, 的大小List<T>
加倍并复制数组。
回答by Haris Hasan
ToList()
creates a new List and put the elements in it which means that there is an associated cost with doing ToList()
. In case of small collection it won't be very noticeable cost but having a huge collection can cause a performance hit in case of using ToList.
ToList()
创建一个新的 List 并将元素放入其中,这意味着与执行相关的成本ToList()
。在小集合的情况下,它不会是非常明显的成本,但是在使用 ToList 的情况下,拥有巨大的集合可能会导致性能下降。
Generally you should not use ToList() unless work you are doing cannot be done without converting collection to List. For example if you just want to iterate through the collection you don't need to perform ToList
通常你不应该使用 ToList() 除非你正在做的工作不能在不将集合转换为列表的情况下完成。例如,如果您只想遍历集合,则不需要执行 ToList
If you are performing queries against a data source for example a Database using LINQ to SQL then the cost of doing ToList is much more because when you use ToList with LINQ to SQL instead of doing Delayed Execution i.e. load items when needed (which can be beneficial in many scenarios) it instantly loads items from Database into memory
如果您正在对数据源执行查询,例如使用 LINQ to SQL 的数据库,那么执行 ToList 的成本要高得多,因为当您将 ToList 与 LINQ to SQL 一起使用时,而不是执行延迟执行,即在需要时加载项目(这可能是有益的)在许多情况下)它会立即将项目从数据库加载到内存中
回答by Mohammad Dehghan
Considering the performance of retrieving file list, ToList()
is negligible. But not really for other scenarios. That really depends on where you are using it.
考虑到检索文件列表的性能,ToList()
可以忽略不计。但不是真的适用于其他场景。这真的取决于你在哪里使用它。
When calling on an array, list, or other collection, you create a copy of the collection as a
List<T>
. The performance here depends on the size of the list. You should do it when really necessary.In your example, you call it on an array. It iterates over the array and adds the items one by one to a newly created list. So the performance impact depends on the number of files.
When calling on an
IEnumerable<T>
, you materializetheIEnumerable<T>
(usually a query).
在调用数组、列表或其他集合时,您可以将集合的副本创建为
List<T>
. 这里的性能取决于列表的大小。你应该在真正需要的时候这样做。在您的示例中,您在数组上调用它。它遍历数组并将项目一项一项添加到新创建的列表中。所以性能影响取决于文件的数量。
当一个呼叫
IEnumerable<T>
,你兑现的IEnumerable<T>
(通常是一个查询)。
回答by TalentTuner
ToList Will create a new list and copy elements from original source to the newly created list so only thing is to copy the elements from the original source and depends on the source size
ToList 将创建一个新列表并将元素从原始源复制到新创建的列表,因此唯一的事情就是从原始源复制元素并取决于源大小
回答by Oscar Mederos
It will be as (in)efficient as doing:
它将与以下操作一样(低)效率:
var list = new List<T>(items);
If you disassemble the source code of the constructor that takes an IEnumerable<T>
, you will see it will do a few things:
如果你反汇编带有 的构造函数的源代码IEnumerable<T>
,你会看到它会做一些事情:
Call
collection.Count
, so ifcollection
is anIEnumerable<T>
, it will force the execution. Ifcollection
is an array, list, etc. it should beO(1)
.If
collection
implementsICollection<T>
, it will save the items in an internal array using theICollection<T>.CopyTo
method. It shouldbeO(n)
, beingn
the length of the collection.If
collection
does not implementICollection<T>
, it will iterate through the items of the collection, and will add them to an internal list.
调用
collection.Count
,因此如果collection
是IEnumerable<T>
,它将强制执行。如果collection
是数组、列表等,则应该是O(1)
.如果
collection
实现ICollection<T>
,它将使用该ICollection<T>.CopyTo
方法将项目保存在内部数组中。它应该是O(n)
,作为n
集合的长度。如果
collection
没有实现ICollection<T>
,它将遍历集合的项目,并将它们添加到内部列表中。
So, yes, it will consume more memory, since it has to create a new list, and in the worst case, it will be O(n)
, since it will iterate through the collection
to make a copy of each element.
所以,是的,它会消耗更多内存,因为它必须创建一个新列表,在最坏的情况下,它将是O(n)
,因为它将遍历collection
以制作每个元素的副本。
回答by Cheng Chen
Is there a performance impact when calling toList()?
调用 toList() 时对性能有影响吗?
Yes of course. Theoretically even i++
has a performance impact, it slows the program for maybe a few ticks.
是的当然。从理论上讲i++
,它甚至会影响性能,它会减慢程序的运行速度。
What does .ToList
do?
有什么作用.ToList
?
When you invoke .ToList
, the code calls Enumerable.ToList()
which is an extension method that return new List<TSource>(source)
. In the corresponding constructor, under the worst circumstance,it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.
当您调用 时.ToList
,代码调用Enumerable.ToList()
它是一个扩展方法,return new List<TSource>(source)
. 在相应的构造函数中,在最坏的情况下,它会遍历 item 容器,并将它们一一添加到一个新的容器中。所以它的行为对性能影响很小。不可能成为应用程序的性能瓶颈。
What's wrong with the code in the question
问题中的代码有什么问题
Directory.GetFiles
goes through the folder and returns all files' names immediatelyinto memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.
Directory.GetFiles
遍历文件夹并立即将所有文件的名称返回到内存中,这有潜在的风险,即 string[] 会占用大量内存,从而减慢一切。
What should be done then
那应该怎么办
It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles
in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:
这取决于。如果您(以及您的业务逻辑)保证文件夹中的文件数量总是很小,那么代码是可以接受的。但仍然建议使用惰性版本:Directory.EnumerateFiles
在 C#4 中。这更像是一个查询,不会立即执行,您可以在其上添加更多查询,例如:
Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))
which will stop searchingthe path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles
.
一旦找到名称包含“myfile”的文件,它将停止搜索路径。这显然是有更好的表现呢.GetFiles
。
回答by jross
"is there a performance impact that needs to be considered?"
“是否有需要考虑的性能影响?”
The issue with your precise scenario is that first and foremost your real concern about performance would be from the hard-drive speed and efficiency of the drive's cache.
您的精确场景的问题在于,您对性能的真正担忧首先来自硬盘驱动器的速度和驱动器缓存的效率。
From that perspective, the impact is surely negligible to the point that NOit need not be considered.
从这个角度来看,影响是可以忽略不计肯定该点NO就不必考虑。
BUT ONLY if you really need the features of the List<>
structure to possibly either make you more productive, or your algorithm more friendly, or some other advantage. Otherwise, you're just purposely adding an insignificant performance hit, for no reason at all. In which case, naturally, you shouldn't do it! :)
但前提是你真的需要List<>
结构的特征来提高你的工作效率,或者你的算法更友好,或者其他一些优势。否则,您只是无缘无故地故意添加微不足道的性能影响。在这种情况下,自然不应该这样做!:)
回答by Martin Liversage
Is there a performance impact when calling toList()?
调用 toList() 时对性能有影响吗?
Yes there is. Using the extension method Enumerable.ToList()
will construct a new List<T>
object from the IEnumerable<T>
source collection which of course has a performance impact.
就在这里。使用扩展方法Enumerable.ToList()
将从源集合构造一个新List<T>
对象,IEnumerable<T>
这当然会对性能产生影响。
However, understanding List<T>
may help you determine if the performance impact is significant.
但是,了解List<T>
可以帮助您确定性能影响是否显着。
List<T>
uses an array (T[]
) to store the elements of the list. Arrays cannot be extended once they are allocated so List<T>
will use an over-sized array to store the elements of the list. When the List<T>
grows beyond the size the underlying array a new array has to be allocated and the contents of the old array has to be copied to the new larger array before the list can grow.
List<T>
使用数组 ( T[]
) 来存储列表的元素。数组一旦分配就无法扩展,因此List<T>
将使用超大数组来存储列表的元素。当List<T>
增长超过底层数组的大小时,必须分配一个新数组,并且必须在列表增长之前将旧数组的内容复制到新的更大的数组中。
When a new List<T>
is constructed from an IEnumerable<T>
there are two cases:
当List<T>
从 an 构造new 时IEnumerable<T>
,有两种情况:
The source collection implements
ICollection<T>
: ThenICollection<T>.Count
is used to get the exact size of the source collection and a matching backing array is allocated before all elements of the source collection is copied to the backing array usingICollection<T>.CopyTo()
. This operation is quite efficient and probably will map to some CPU instruction for copying blocks of memory. However, in terms of performance memory is required for the new array and CPU cycles are required for copying all the elements.Otherwise the size of the source collection is unknown and the enumerator of
IEnumerable<T>
is used to add each source element one at a time to the newList<T>
. Initially the backing array is empty and an array of size 4 is created. Then when this array is too small the size is doubled so the backing array grows like this 4, 8, 16, 32 etc. Every time the backing array grows it has to be reallocated and all elements stored so far have to be copied. This operation is much more costly compared to the first case where an array of the correct size can be created right away.Also, if your source collection contains say 33 elements the list will end up using an array of 64 elements wasting some memory.
源集合实现
ICollection<T>
: ThenICollection<T>.Count
用于获取源集合的确切大小,并在使用 将源集合的所有元素复制到后备数组之前分配匹配的后备数组ICollection<T>.CopyTo()
。这个操作非常有效,可能会映射到一些 CPU 指令来复制内存块。但是,就性能而言,新阵列需要内存,复制所有元素需要 CPU 周期。否则,源集合的大小未知,并且使用 的枚举器
IEnumerable<T>
将每个源元素一次一个添加到新的List<T>
. 最初,后备数组为空,并创建了一个大小为 4 的数组。然后,当这个数组太小时,大小会加倍,因此支持数组会像 4、8、16、32 等一样增长。每次支持数组增长时,都必须重新分配它,并且必须复制到目前为止存储的所有元素。与可以立即创建正确大小的数组的第一种情况相比,此操作的成本要高得多。此外,如果您的源集合包含 33 个元素,则列表最终将使用 64 个元素的数组,浪费一些内存。
In your case the source collection is an array which implements ICollection<T>
so the performance impact is not something you should be concerned about unless your source array is very large. Calling ToList()
will simply copy the source array and wrap it in a List<T>
object. Even the performance of the second case is not something to worry about for small collections.
在您的情况下,源集合是一个实现的数组,ICollection<T>
因此除非您的源数组非常大,否则您不应该担心性能影响。调用ToList()
将简单地复制源数组并将其包装在一个List<T>
对象中。对于小集合,即使是第二种情况的性能也不是什么值得担心的事情。