C#中的大数组算法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/111026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Large array arithmetics in C#
提问by AnnaR
Which is the best way to store a 2D array in c# in order to optimize performance when performing lots of arithmetic on the elements in the array?
在对数组中的元素执行大量算术时,为了优化性能,在 c# 中存储 2D 数组的最佳方法是什么?
We have large (approx 1.5G) arrays, which for example we want to multiply with each other element by element. Performance is critical. The context in which this is done is in c#. Is there any smart way of storing the arrays and iterating over them? Could we write these parts in unmanaged C++ and will this really increase performance? The arrays need to be accessible to the rest of the c# program.
我们有大型(大约 1.5G)数组,例如,我们想要一个元素一个元素地相互乘以。性能至关重要。完成此操作的上下文在 c# 中。有没有什么聪明的方法来存储数组并迭代它们?我们可以用非托管 C++ 编写这些部分,这真的会提高性能吗?数组需要可供 c# 程序的其余部分访问。
Currently (in c) the array is stored as a single long vector. We perform calculations on each element in the array and overwrite the old value. The calculations are usually unique for each element in the vector.
当前(在 c 中)数组存储为单个长向量。我们对数组中的每个元素执行计算并覆盖旧值。对于向量中的每个元素,计算通常是唯一的。
Timing experiments show that storing and iterating over the data as an array in C# is slower than storing it as a 2D array. I would like to know if there is an even better way of handling the data. The specific arithmetics performed are not relevant for the question.
计时实验表明,在 C# 中将数据存储和迭代为数组比将其存储为二维数组慢。我想知道是否有更好的方法来处理数据。执行的具体算术与问题无关。
回答by Cameron MacFarland
For best array performance, make sure you're using a single dimension array with lower index of 0.
为了获得最佳数组性能,请确保您使用的是索引为 0 的单维数组。
To access the elements of the array as fast as possible, you can use unsafe pointers like so:
要尽快访问数组的元素,您可以使用不安全的指针,如下所示:
int[] array = Enumerable.Range(0, 1000).ToArray();
int count = 0;
unsafe {
fixed (int* pArray = array) {
for (int i = 0; i < array.Length; i++) {
count += *(pArray + i);
}
}
}
EDITDrat! Didn't notice you said 2D array. This trick won't work with a multi-dimensional array so I'm not sure how much help it will be. Although you could turn any array into a single-dimension array by doing some arithmetic on the array index. Just depends on if you care about the performance hit in indexing the array or in iterating over the array.
编辑Drat!没注意到你说的是二维数组。这个技巧不适用于多维数组,所以我不确定它会有多大帮助。尽管您可以通过对数组索引执行一些算术将任何数组转换为单维数组。只取决于您是否关心索引数组或迭代数组时的性能影响。
回答by Jason Stevenson
Anna,
安娜,
Here is a great page that discusses the performance difference between tradition scientific programming languages (fortran, C++) and c#.
这是一个很棒的页面,讨论了传统科学编程语言(fortran、C++)和 c# 之间的性能差异。
http://msdn.microsoft.com/en-us/magazine/cc163995.aspx
http://msdn.microsoft.com/en-us/magazine/cc163995.aspx
According to the article C#, when using rectangular arrays (2d) can be a very good performer. Here is a graph that shows the difference in performance between jagged arrays (an array of arrays) and rectangular arrays (multi-dimensional) arrays.
根据文章 C#,当使用矩形阵列 (2d) 时,性能非常好。这是一张图表,显示了锯齿状数组(数组数组)和矩形数组(多维)数组之间的性能差异。
alt text http://i.msdn.microsoft.com/cc163995.fig08.gif
替代文字 http://i.msdn.microsoft.com/cc163995.fig08.gif
I would suggest experimenting yourself, and use the Performance Analysis in VS 2008 to compare.
我建议自己尝试一下,并使用 VS 2008 中的性能分析进行比较。
If using C# is "fast enough" then your application will be that much easier to maintain.
如果使用 C#“足够快”,那么您的应用程序将更容易维护。
Good Luck!
祝你好运!
回答by TraumaPony
If you download F#, and reference one of the runtime libraries (I think it's FSharp.PowerPack), and use Microsoft.FSharp.Maths.Matrix. It optimises itself based on whether you are using a dense or sparse matrix.
如果您下载 F#,并引用运行时库之一(我认为它是 FSharp.PowerPack),并使用 Microsoft.FSharp.Maths.Matrix。它会根据您使用的是密集矩阵还是稀疏矩阵来优化自身。
回答by Nils Pipenbrinck
Do you iterate the matrix by row or by colum or both? Do you always access nearby elements or do you do random accesses on the matrix.
你是按行还是按列迭代矩阵或两者兼而有之?你总是访问附近的元素还是随机访问矩阵。
If there is some locality in your accesses but you're not accessing it sequential (typical in matrix multiplication for example) then you can get a hugeperformance difference by storing your matrix in a more cache-friendly way.
如果您的访问存在某些局部性,但您没有按顺序访问它(例如,典型的矩阵乘法),那么您可以通过以更缓存友好的方式存储矩阵来获得巨大的性能差异。
A pretty easy way to do that is to write a little access function to turn your row/colum indices into an index and work on a one dimensional matrix, the cache-friendy way.
一个非常简单的方法是编写一个小访问函数来将您的行/列索引转换为索引并处理一维矩阵,缓存友好的方式。
The function should group nearby coordinates into nearby indices. The morton-order can be used if you work on power of two sizes. For non-power sizes you can often bring just the lowest 4 bits into morton order and use normal index-arithmetic for the upper bits. You'll still get a significant speed-up, even if the coordinate to index conversion looks seems to be a costly operation.
该函数应将附近的坐标分组为附近的索引。如果您处理两个大小的幂,则可以使用 morton-order。对于非幂大小,您通常可以将最低 4 位放入 morton 顺序,并对高位使用正常的索引算术。您仍然会获得显着的加速,即使从坐标到索引的转换看起来是一项代价高昂的操作。
http://en.wikipedia.org/wiki/Z-order_(curve)<-- sorry, can't link that SO does not like URL's with a dash in it. You have to cut'n'paste.
http://en.wikipedia.org/wiki/Z-order_(curve)<--抱歉,无法链接 SO 不喜欢带有破折号的 URL。你必须剪切和粘贴。
A speed up of factor 10 and more are realistic btw. It depends on the algorithm you ron over your matrices though.
顺便说一句,10 倍以上的加速是现实的。不过,这取决于您在矩阵上运行的算法。