C语言 行优先与列优先的混淆
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33862730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Row-major vs Column-major confusion
提问by vexe
I've been reading a lot about this, the more I read the more confused I get.
我已经阅读了很多关于此的内容,我读得越多,我就越困惑。
My understanding: In row-major rows are stored contiguously in memory, in column-major columns are stored contiguously in memory. So if we have a sequence of numbers [1, ..., 9]and we want to store them in a row-major matrix, we get:
我的理解:行优先行连续存储在内存中,列优先列连续存储在内存中。所以如果我们有一个数字序列[1, ..., 9]并且我们想将它们存储在一个行主矩阵中,我们得到:
|1, 2, 3|
|4, 5, 6|
|7, 8, 9|
while the column-major (correct me if I'm wrong) is:
而专栏主要(如果我错了,请纠正我)是:
|1, 4, 7|
|2, 5, 8|
|3, 6, 9|
which is effectively the transpose of the previous matrix.
这实际上是前一个矩阵的转置。
My confusion: Well, I don't see any difference. If we iterate on both the matrices (by rows in the first one, and by columns in the second one) we'll cover the same values in the same order: 1, 2, 3, ..., 9
我的困惑:嗯,我看不出有什么区别。如果我们对两个矩阵进行迭代(第一个矩阵中的行,第二个矩阵中的列),我们将以相同的顺序覆盖相同的值:1, 2, 3, ..., 9
Even matrix multiplication is the same, we take the first contiguous elements and multiply them with the second matrix columns. So say we have the matrix M:
即使矩阵乘法是相同的,我们取第一个连续元素并将它们与第二个矩阵列相乘。所以说我们有矩阵M:
|1, 0, 4|
|5, 2, 7|
|6, 0, 0|
If we multiply the previous row-major matrix Rwith M, that is R x Mwe'll get:
如果我们将前面的行主矩阵R与相乘M,R x M我们将得到:
|1*1 + 2*0 + 3*4, 1*5 + 2*2 + 3*7, etc|
|etc.. |
|etc.. |
If we multiply the column-major matrix Cwith M, that is C x Mby taking the columns of Cinstead of its rows, we get exactly the same result from R x M
如果我们将列主矩阵C与相乘M,即C x M取 的列C而不是其行,我们得到完全相同的结果R x M
I'm really confused, if everything is the same, why do these two terms even exist? I mean even in the first matrix R, I could look at the rows and consider them columns...
我真的很困惑,如果一切都一样,为什么这两个术语甚至存在?我的意思是即使在第一个矩阵中R,我也可以查看行并将它们视为列...
Am I missing something? What does row-major vs col-major actually imply on my matrix math? I've always learned in my Linear Algebra classes that we multiply rows from the first matrix with columns from the second one, does that change if the first matrix was in column-major? do we now have to multiply its columns with columns from the second matrix like I did in my example or was that just flat out wrong?
我错过了什么吗?row-major 与 col-major 实际上对我的矩阵数学意味着什么?我一直在我的线性代数课程中学到,我们将第一个矩阵的行与第二个矩阵的列相乘,如果第一个矩阵是列主矩阵,情况会改变吗?我们现在是否必须像我在示例中所做的那样将其列与第二个矩阵中的列相乘,还是完全错误?
Any clarifications are really appreciated!
任何澄清真的很感激!
EDIT:One of the other main sources of confusion I'm having is GLM... So I hover over its matrix type and hit F12 to see how it's implemented, there I see a vector array, so if we have a 3x3 matrix we have an array of 3 vectors. Looking at the type of those vectors I saw 'col_type' so I assumed that each one of those vectors represent a column, and thus we have a column-major system right?
编辑:我遇到的其他主要混淆来源之一是 GLM ......所以我将鼠标悬停在它的矩阵类型上并点击 F12 以查看它是如何实现的,在那里我看到了一个向量数组,所以如果我们有一个 3x3 矩阵,我们有一个包含 3 个向量的数组。查看这些向量的类型,我看到了“col_type”,所以我假设这些向量中的每一个都代表一列,因此我们有一个以列为主的系统,对吗?
Well, I don't know to be honest. I wrote this print function to compare my translation matrix with glm's, I see the translation vector in glm at the last row, and mine is at the last column...
好吧,我不知道老实说。我写了这个打印函数来比较我的翻译矩阵和 glm 的,我在最后一行看到 glm 中的翻译向量,而我的在最后一列......
This adds nothing but more confusion. You can clearly see that each vector in glmTranslatematrix represents a row in the matrix. So... that means that the matrix is row-major right? What about my matrix? (I'm using a float array[16]) the translation values are in the last column, does that mean my matrix is column-major and I didn't now it? tries to stop head from spinning
这只会增加更多的混乱。您可以清楚地看到glmTranslate矩阵中的每个向量代表矩阵中的一行。所以......这意味着矩阵是行主对吗?我的矩阵呢?(我使用的是浮点数组[16])翻译值在最后一列,这是否意味着我的矩阵是列优先的而我现在不是?试图阻止头部旋转
采纳答案by decltype_auto
Let's look at algebra first; algebra doesn't even have a notion of "memory layout" and stuff.
我们先看代数;代数甚至没有“内存布局”之类的概念。
From an algebraic pov, a MxN real matrix can act on a |R^N vector on its right side and yield a |R^M vector.
根据代数 pov,MxN 实矩阵可以作用于其右侧的 |R^N 向量并产生 |R^M 向量。
Thus, if you were sitting in an exam and given a MxN Matrix and a |R^N vector, you could with trivial operations multiply them and get a result - whether that result is right or wrong will not depend on whether the software your professor uses to check your results internally uses column-major or a row-major layout; it will only depend on if you calculated the contraction of each row of the matrix with the (single) column of the vector properly.
因此,如果你正在参加考试并给出一个 MxN 矩阵和一个 |R^N 向量,你可以通过简单的操作将它们相乘并得到结果——结果是对还是错并不取决于你的教授是否使用了软件用于在内部检查您的结果使用列优先或行优先布局;它仅取决于您是否正确计算了矩阵每一行与向量(单)列的收缩。
To produce a correct output, the software will - by whatever means - essentially have to contract each row of the Matrix with the column vector, just like you did in the exam.
为了产生正确的输出,软件将——无论如何——本质上必须用列向量收缩矩阵的每一行,就像你在考试中所做的那样。
Thus, the difference between software that aligns column-major and software that uses row-major-layout is not whatit calculates, but just how.
因此,对齐列优先的软件和使用行优先布局的软件之间的区别不在于它计算什么,而在于如何.
To put it more pecisely, the difference between those layouts with regard to the topcial single row's contraction with the column vector is justthe means to determine
更确切地说,这些布局之间关于局部单行与列向量的收缩的差异只是确定的手段
Where is the next element of the current row?
- For a row-major-layout it's the element just in the next bucket in memory
- For a column-major-layout it's the element in the bucket M buckets away.
- 对于行主要布局,它是内存中下一个存储桶中的元素
- 对于列主要布局,它是存储桶中 M 存储桶中的元素。
And thats it.
就是这样。
To show you how that column/row magic is summoned in practice:
向您展示如何在实践中调用列/行魔法:
You haven't tagged your question with "c++", but because you mentioned 'glm', I assume that you can get along with C++.
你没有用“c++”标记你的问题,但是因为你提到了' glm',我假设你可以和C++相处。
In C++'s standard library there's an infamous beast called valarray, which, besides other tricky features, has overloads of operator[], one of them can take a std::slice( which is essentially a very boring thing, consisting of just three integer-type numbers).
在 C++ 的标准库中有一个臭名昭著的野兽叫做valarray,除了其他棘手的功能外,还有operator[] 的重载,其中一个可以带 a std::slice(这本质上是一件非常无聊的事情,只包含三个整数类型的数字)。
This little slice thing however, has everything one would need to access a row-major-storage column-wise or a column-major-storage row-wise - it has a start, a length, and a stride - the latter represents the "distance to next bucket" I mentioned.
但是,这个小切片具有按列访问行主要存储或按行访问列主要存储所需的一切 - 它具有开始,长度和步幅 - 后者代表“到下一个桶的距离”我提到过。
回答by thurizas
I think you are mix up an implementation detail with usage, if you will.
如果您愿意,我认为您将实现细节与用法混为一谈。
Lets start with a two-dimensional array, or matrix:
让我们从一个二维数组或矩阵开始:
| 1 2 3 |
| 4 5 6 |
| 7 8 9 |
The problem is that computer memory is a one-dimensional array of bytes. To make our discussion easier, lets group the single bytes into groups of four, thus we have something looking like this, (each single, +-+ represents a byte, four bytes represents an integer value (assuming 32-bit operating systems) :
问题在于计算机内存是一维字节数组。为了使我们的讨论更容易,让我们将单个字节分成四个一组,因此我们有这样的东西,(每个单个,+-+ 代表一个字节,四个字节代表一个整数值(假设是 32 位操作系统):
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
| | | | | | | | |
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
\/ \ /
one byte one integer
low memory ------> high memory
Another way of representing
另一种表示方式
So, the question is how to map a two dimensional structure (our matrix) onto this one dimensional structure (i.e. memory). There are two ways of doing this.
所以,问题是如何将二维结构(我们的矩阵)映射到这个一维结构(即内存)上。有两种方法可以做到这一点。
Row-major order: In this order we put the first row in memory first, and then the second, and so on. Doing this, we would have in memory the following:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
行优先顺序:按照这个顺序,我们首先将第一行放入内存中,然后是第二行,依此类推。这样做,我们将在内存中拥有以下内容:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
With this method, we can find a given element of our array by performing the following arithmetic. Suppose we want to access the $M_{ij}$ element of the array. If we assume that we have a pointer to the first element of the array, say ptr, and know the number of columns say nCol, we can find any element by:
使用这种方法,我们可以通过执行以下算术来找到数组的给定元素。假设我们要访问数组的 $M_{ij}$ 元素。如果我们假设我们有一个指向数组第一个元素的指针,比如ptr,并且知道列数比如nCol,我们可以通过以下方式找到任何元素:
$M_{ij} = i*nCol + j$
To see how this works, consider M_{02} (i.e. first row, third column -- remember C is zero based.
要了解其工作原理,请考虑 M_{02}(即第一行第三列——记住 C 是从零开始的。
$M_{02} = 0*3 + 2 = 2
So we access the third element of the array.
所以我们访问数组的第三个元素。
Column-major ordering: In this order we put the first column in memory first, and then the second, and so or. Doing this we would have in memory the following:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 4 | 7 | 2 | 5 | 8 | 3 | 6 | 9 | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
列优先顺序:按照这个顺序,我们首先将第一列放在内存中,然后是第二列,以此类推。这样做我们将在内存中拥有以下内容:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 4 | 7 | 2 | 5 | 8 | 3 | 6 | 9 | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
SO, the short answer - row-major and column-major format describe how the two (or higher) dimensional arrays are mapped into a one dimensional array of memory.
所以,简短的回答 - 行优先和列优先格式描述了如何将二维(或更高)维数组映射到一维内存数组。
Hope this helps. T.
希望这可以帮助。T。
回答by Matthew Gunn
Doesn't matter what you use: just be consistent!
不管你使用什么:只要保持一致!
Row major or column major is just a convention. Doesn't matter. C uses row major, Fortran uses column. Both work. Use what's standard in your programming language/environment.
行专业或列专业只是一种约定。没关系。C 使用行专业,Fortran 使用列。两者都有效。使用您的编程语言/环境中的标准。
Mismatching the two will !@#$ stuff up
两者不匹配将 !@#$ 搞砸
If you use row major addressing on a matrix stored in colum major, you can get the wrong element, read past end of the array, etc...
如果您在存储在 colum major 中的矩阵上使用行主要寻址,则可能会得到错误的元素,读取数组的末尾等...
Row major: A(i,j) element is at A[j + i * n_columns]; <---- mixing these up will
Col major: A(i,j) element is at A[i + j * n_rows]; <---- make your code fubar
It's incorrect to say code to do matrix multiplication is the same for row major and column major
说行主要和列主要进行矩阵乘法的代码相同是不正确的
(Of course the math of matrix multiplication is the same.) Imagine you have two arrays in memory:
(当然矩阵乘法的数学是一样的。)想象一下你有两个数组在内存中:
X = [x1, x2, x3, x4] Y = [y1, y2, y3, y4]
If matrices are stored in column major then X, Y, and X*Y are:
如果矩阵存储在主列中,则 X、Y 和 X*Y 为:
IF COL MAJOR: [x1, x3 * [y1, y3 = [x1y1+x3y2, x1y3+x3y4
x2, x4] y2, y4] x2y1+x4y2, x2y3+x4y4]
If matrices are stored in row major then X, Y, and X*Y are:
如果矩阵存储在行主中,则 X、Y 和 X*Y 为:
IF ROW MAJOR: [x1, x2 [y1, y2 = [x1y1+x2y3, x1y2+x2y4;
x3, x4] y3, y4] x3y1+x4y3, x3y2+x4y4];
X*Y in memory if COL major [x1y1+x3y2, x2y1+x4y2, x1y3+x3y4, x2y3+x4y4]
if ROW major [x1y1+x2y3, x1y2+x2y4, x3y1+x4y3, x3y2+x4y4]
There's nothing deep going on here. It's just two different conventions. It's like measuring in miles or kilometers. Either works, you just can't flip back and forth between the two without converting!
这里没有什么深刻的事情。这只是两个不同的约定。这就像以英里或公里为单位进行测量。无论哪种都有效,您不能在不转换的情况下在两者之间来回切换!
回答by Y.C.Jung
You are right. it doesn't matter if a system stored the data in a row-major structure or a column-major one. It is just like a protocol. Computer : "Hey, human. I'm going to store your array this way. No prob. Huh?" However, when it comes to performance, it matters. consider the following three things.
你是对的。系统将数据存储在行优先结构还是列优先结构中并不重要。它就像一个协议。计算机:“嘿,人类。我要以这种方式存储你的阵列。没问题。嗯?” 然而,当谈到性能时,它很重要。考虑以下三件事。
1. most arrays are accessed in row-major order.
1. 大多数数组都是按行优先顺序访问的。
2. When you access memory, it is not directly read from memory. You first store some blocks of data from memory to cache, then you read data from cache to your processor.
2.访问内存时,不是直接从内存中读取的。您首先将一些数据块从内存存储到缓存,然后将数据从缓存读取到处理器。
3. If the data you want does not exist in cache, cache should re-fetch the data from the memory
3.如果你想要的数据不在缓存中,缓存应该从内存中重新取数据
When a cache fetches data from memory, locality is important. That is, if you store data sparsely in memory, your cache should fetch data from memory more often. This action corrupts your programs performance because accessing memory is far slower(over 100times!) then accessing cache. The less you access memory, the faster your program. So, this row-major array is more efficient because accessing its data is more likely to be local.
当缓存从内存中获取数据时,局部性很重要。也就是说,如果您将数据稀疏地存储在内存中,您的缓存应该更频繁地从内存中获取数据。此操作会破坏您的程序性能,因为访问内存比访问缓存慢得多(超过 100 倍!)。访问内存越少,程序越快。因此,这个行优先数组更有效,因为访问它的数据更有可能是本地的。
回答by eulerworks
Ok, so given that the word "confusion" is literally in the title I can understand the level of...confusion.
好的,考虑到“混淆”这个词在标题中的字面意思,我可以理解......混淆的程度。
Firstly, this absolutely is a real problem
首先,这绝对是一个真正的问题
Never, EVER succumb to the idea that "it is used be but...PC's nowadays..."
永远,永远不要屈服于“它被用来但是......现在的PC......”的想法
Of the primary issues here are:-Cache eviction strategy (LRU, FIFO, etc.) as @Y.C.Jung was beginning to touch on
-Branch prediction
-Pipelining (it's depth, etc)
-Actual physical memory layout
-Size of memory
-Architecture of machine, (ARM, MIPS, Intel, AMD, Motorola, etc.)
这里的主要问题是:-Cache eviction strategy (LRU, FIFO, etc.) as @Y.C.Jung was beginning to touch on
-Branch prediction
-Pipelining (it's depth, etc)
-Actual physical memory layout
-Size of memory
-Architecture of machine, (ARM, MIPS, Intel, AMD, Motorola, etc.)
This answer will focus on the Harvard architecture, Von Neumann machine as it is most applicable to the current PC.
这个答案将集中在哈佛架构,冯诺依曼机,因为它最适用于当前的 PC。
The memory hierarchy:
内存层次结构:
https://en.wikipedia.org/wiki/File:ComputerMemoryHierarchy.svgis
https://en.wikipedia.org/wiki/File:ComputerMemoryHierarchy.svgis
Is a juxtaposition of costversus speed.
是成本与速度的并列。
For today's standard PC system this would be something like:SIZE:
500GB HDD > 8GB RAM > L2 Cache > L1 Cache > Registers.
SPEED:
500GB HDD < 8GB RAM < L2 Cache < L1 Cache < Registers.
对于今天的标准 PC 系统,这将类似于:SIZE:
500GB HDD > 8GB RAM > L2 Cache > L1 Cache > Registers.
SPEED:
500GB HDD < 8GB RAM < L2 Cache < L1 Cache < Registers.
This leads to the idea of Temporal and Spatial locality. One means howyour data is organized, (code, working set, etc.), the other means physically whereyour data is organized in "memory."
这导致了时间和空间局部性的想法。手段之一是如何将数据组织,(代码,工作组等),身体的其他方式在那里你的数据在举办的“记忆”。
Given that "most" of today's PC's are little-endian(Intel) machines as of late, they lay data into memory in a specific little-endian ordering. It does differ from big-endian, fundamentally.
鉴于当今“大多数”PC 是最近的小端(Intel) 机器,它们以特定的小端顺序将数据放入内存中。从根本上说,它确实不同于大端。
https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html(covers it rather... swiftly;) )
https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html(涵盖它而不是...... swiftly;))
(For the simplicity of this example, I am going to 'say' that things happen in single entries, this is incorrect, entire cache blocks are typically accessed and vary drastically my manufacturer, much less model).
(为了这个例子的简单性,我将“说”事情发生在单个条目中,这是不正确的,通常访问整个缓存块并且我的制造商变化很大,更不用说模型了)。
So, now that we have that our of the way, if, hypotheticallyyour program demanded 1GB of data from your 500GB HDD, loaded into your 8GB of RAM,then into the cachehierarchy, then eventually registers, where your program went and read the first entry from your freshest cache line just to have your second (in YOUR code) desired entry happen to be sitting in the next cache line,(i.e. the next ROWinstead of columnyou would have a cache MISS.
所以,既然我们有我们的方式,如果假设您的程序需要1GB of data from your 500GB HDD,加载到您的8GB of RAM,然后进入cache层次结构,那么最终registers,您的程序去哪里并从您的最新缓存行读取第一个条目只是为了获得您的第二个(在您的代码中)所需的条目恰好位于next cache line,(即下一个ROW而不是列中,您将有一个缓存MISS。
Assuming the cacheis full, because it is small, upon a miss, according to the eviction scheme, a line would be evicted to make room for the line that 'does' have the next data you need. If this pattern repeated you would have a MISSon EVERYattempted data retrieval!
假设缓存已满,因为它很小,在未命中时,根据逐出方案,将逐出一行,为“确实”具有您需要的下一个数据的行腾出空间。如果这种模式重复,你将有一个MISS上EVERY尝试数据检索!
Worse, you would be evicting lines that actually have valid data you are about to need, so you will have to retrieve them AGAIN and AGAIN.
更糟糕的是,您将驱逐实际上具有您将需要的有效数据的行,因此您必须一次又一次地检索它们。
The term for this is called: thrashing
这个术语被称为: thrashing
https://en.wikipedia.org/wiki/Thrashing_(computer_science)and can indeed crasha poorly written/error prone system. (Think windows BSOD)....
https://en.wikipedia.org/wiki/Thrashing_(computer_science)并且确实会使编写不佳/容易出错的系统崩溃。(想想Windows BSOD)....
On the other hand, if you had laid out the data properly, (i.e. Row major)...you WOULD still have misses!
另一方面,如果你正确地布置了数据,(即行专业)......你仍然会错过!
But these misses would onlyoccur at the end of each retrieval, not on EVERY attempted retrieval.This results in orders of magnitude of difference in system and program performance.
但是这些未命中只会在每次检索结束时发生,而不是在每次尝试检索时发生。这导致系统和程序性能的数量级差异。
Very very simple snippet:
非常非常简单的片段:
#include<stdio.h>
#define NUM_ROWS 1024
#define NUM_COLS 1024
int COL_MAJOR [NUM_ROWS][NUM_COLS];
int main (void){
int i=0, j=0;
for(i; i<NUM_ROWS; i++){
for(j; j<NUM_COLS; j++){
COL_MAJOR[j][i]=(i+j);//NOTE i,j order here!
}//end inner for
}//end outer for
return 0;
}//end main
Now, compile with:gcc -g col_maj.c -o col.o
现在,编译:gcc -g col_maj.c -o col.o
Now, run with:time ./col.oreal 0m0.009suser 0m0.003ssys 0m0.004s
现在,运行:time ./col.oreal 0m0.009suser 0m0.003ssys 0m0.004s
Now repeat for ROW major:
现在重复 ROW 专业:
#include<stdio.h>
#define NUM_ROWS 1024
#define NUM_COLS 1024
int ROW_MAJOR [NUM_ROWS][NUM_COLS];
int main (void){
int i=0, j=0;
for(i; i<NUM_ROWS; i++){
for(j; j<NUM_COLS; j++){
ROW_MAJOR[i][j]=(i+j);//NOTE i,j order here!
}//end inner for
}//end outer for
return 0;
}//end main
Compile:terminal4$ gcc -g row_maj.c -o row.o
Run:time ./row.oreal 0m0.005suser 0m0.001ssys 0m0.003s
编译:terminal4$ gcc -g row_maj.c -o row.o
运行:time ./row.oreal 0m0.005suser 0m0.001ssys 0m0.003s
Now, as you can see, the Row Majorone was significantly faster.
现在,正如您所看到的,Row Major 的速度要快得多。
Not convinced?If you would like to see a more drastic example: Make the matrix 1000000 x 1000000, initialize it, transpose it and print it to stdout. ```
不服气?如果你想看一个更激烈的例子:使矩阵为 1000000 x 1000000,初始化它,转置它并将它打印到标准输出。``
(Note, on a *NIX system you WILL need to set ulimit unlimited)
(注意,在 *NIX 系统上,您需要将 ulimit 设置为无限制)
ISSUES with my answer:-Optimizing compilers, they change a LOT of things!
-Type of system
-Please point any others out
-This system has an Intel i5 processor
我的回答有问题:-Optimizing compilers, they change a LOT of things!
-Type of system
-Please point any others out
-This system has an Intel i5 processor
回答by Juwhan Kim
A short addendum to above answers. In terms of C, where memory is accessed almost directly, the row-major or column-major order affects your program in 2 ways: 1. It affects the layout of your matrix in memory 2. The order of element access that must be kept - in the form of ordering loops.
以上答案的简短附录。就 C 而言,几乎直接访问内存,行优先或列优先顺序以两种方式影响您的程序: 1. 它影响您的矩阵在内存中的布局 2. 必须保持元素访问的顺序- 以排序循环的形式。
- is explained quite thoroughly in the previous answers, so I will add to 2.
- 在前面的答案中已经解释得很透彻了,所以我将添加到 2。
eulerworks answer points out that in his example, using row major matrix brought about significant slow down in calculation. Well, he is right, but the result can be at the same time reversed.
eulerworks answer 指出,在他的例子中,使用行主矩阵会导致计算速度显着减慢。好吧,他是对的,但结果可以同时逆转。
The loop order was for(over rows) { for(over columns) { do something over a matrix } }. Which means that the dual loop will access elements in a row and then move over to the next row. For example, A(0,1) -> A(0,2) -> A(0,3) -> ... -> A(0,N_ROWS) -> A(1,0) -> ...
循环顺序是 for(over rows) { for(over columns) { do something over a matrix } }。这意味着双循环将访问一行中的元素,然后移动到下一行。例如,A(0,1) -> A(0,2) -> A(0,3) -> ... -> A(0,N_ROWS) -> A(1,0) -> .. .
In such case, if A was stored in row major format there would be minimal cache misses since the elements will probably lined up in linear fashion in memory. Otherwise in column-major format, memory access will jump around using N_ROWS as a stride. So row-major is faster in the case.
在这种情况下,如果 A 以行主要格式存储,则缓存未命中将最少,因为元素可能会在内存中以线性方式排列。否则在列优先格式中,内存访问将使用 N_ROWS 作为步幅跳跃。所以在这种情况下 row-major 更快。
Now, we can actually switch the loop, such that it will for(over columns) { for(over rows) { do something over a matrix } }. For this case, the result will be exactly the opposite. Column major calculation will be faster since the loop will read elements in columns in linear fashion.
现在,我们实际上可以切换循环,这样它就会 for(over columns) { for(over rows) { 在矩阵上做一些事情 } }。对于这种情况,结果将完全相反。列主要计算会更快,因为循环将以线性方式读取列中的元素。
Hence, you might as well remember this: 1. Selecting row major or column major storage format is up to your taste, even though the traditional C programming community seem to prefer the row-major format. 2. Although you are pretty much free to choose whatever you may like, you need to be consistent with the notion of the indexing. 3. Also, this is quite important, keep in mind that when writing down your own algorithms, try to order the loops so that it will honor the storage format of your choice. 4. Be consistent.
因此,您不妨记住这一点: 1. 选择行优先或列优先存储格式是您的喜好,即使传统的 C 编程社区似乎更喜欢行优先格式。2. 尽管您可以自由选择您喜欢的任何内容,但您需要与索引的概念保持一致。3. 另外,这很重要,请记住,在写下您自己的算法时,请尝试对循环进行排序,使其符合您选择的存储格式。4. 保持一致。
回答by eholley
Given the explanations above, here is a code snippetdemonstrating the concept.
鉴于上面的解释,这里是一个演示这个概念的代码片段。
//----------------------------------------------------------------------------------------
// A generalized example of row-major, index/coordinate conversion for
// one-/two-dimensional arrays.
// ex: data[i] <-> data[r][c]
//
// Sandboxed at: http://swift.sandbox.bluemix.net/#/repl/5a077c462e4189674bea0810
//
// -eholley
//----------------------------------------------------------------------------------------
// Algorithm
let numberOfRows = 3
let numberOfColumns = 5
let numberOfIndexes = numberOfRows * numberOfColumns
func index(row: Int, column: Int) -> Int {
return (row * numberOfColumns) + column
}
func rowColumn(index: Int) -> (row: Int, column: Int) {
return (index / numberOfColumns, index % numberOfColumns)
}
//----------------------------------------------------------------------------------------
// Testing
let oneDim = [
0, 1, 2, 3, 4,
5, 6, 7, 8, 9,
10, 11, 12, 13, 14,
]
let twoDim = [
[ 0, 1, 2, 3, 4 ],
[ 5, 6, 7, 8, 9 ],
[ 10, 11, 12, 13, 14 ],
]
for i1 in 0..<numberOfIndexes {
let v1 = oneDim[i1]
let rc = rowColumn(index: i1)
let i2 = index(row: rc.row, column: rc.column)
let v2 = oneDim[i2]
let v3 = twoDim[rc.row][rc.column]
print(i1, v1, i2, v2, v3, rc)
assert(i1 == i2)
assert(v1 == v2)
assert(v2 == v3)
}
/* Output:
0 0 0 0 0 (row: 0, column: 0)
1 1 1 1 1 (row: 0, column: 1)
2 2 2 2 2 (row: 0, column: 2)
3 3 3 3 3 (row: 0, column: 3)
4 4 4 4 4 (row: 0, column: 4)
5 5 5 5 5 (row: 1, column: 0)
6 6 6 6 6 (row: 1, column: 1)
7 7 7 7 7 (row: 1, column: 2)
8 8 8 8 8 (row: 1, column: 3)
9 9 9 9 9 (row: 1, column: 4)
10 10 10 10 10 (row: 2, column: 0)
11 11 11 11 11 (row: 2, column: 1)
12 12 12 12 12 (row: 2, column: 2)
13 13 13 13 13 (row: 2, column: 3)
14 14 14 14 14 (row: 2, column: 4)
*/
回答by Paul
Today there is no reason to use other then column-major order, there are several libraries that support it in c/c++ (eigen,armadillo,...). Furthermore column-major order is more natural, eg. pictures with [x,y,z] are stored slice by slice in file, this is column-major order. While in two dimension it may be confusing to choose better order, in higher dimension it is quite clear that column-major order is the only solution in many situation.
今天没有理由使用其他列优先顺序,有几个库在 c/c++ 中支持它(特征,犰狳,...)。此外,列主序更自然,例如。带有 [x,y,z] 的图片逐片存储在文件中,这是列优先顺序。虽然在二维中选择更好的顺序可能会令人困惑,但在更高维度中,很明显列优先顺序是许多情况下的唯一解决方案。
Authors of C created concept of arrays but perhaps they did not expect that somebody had used it as a matrices. I would be shocked myself if I saw how arrays are used in place where already everything was made up in fortran and column-major order. I think that row-major order is simply alternative to column-major order but only in situation where it is really needed (for now I don't know about any).
C 的作者创造了数组的概念,但也许他们没想到有人将它用作矩阵。如果我看到数组是如何在已经按照 Fortran 和列主要顺序组成的地方使用的,我自己会感到震惊。我认为行优先顺序只是列优先顺序的替代方案,但仅在真正需要的情况下(目前我不知道)。
It is strangely that still someone creates library with row-major order. It is unnecessary waste of energy and time. I hope that one day everything will be column-major order and all confusions simply disappear.
奇怪的是,仍然有人以行优先顺序创建库。这是不必要的精力和时间浪费。我希望有朝一日,一切都会成为列主序,所有的困惑都会消失。


