C++ 循环遍历特征矩阵的最有效方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16283000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Most efficient way to loop through an Eigen matrix
提问by random_user
I'm creating some functions to do things like the "separated sum" of negative and positive number, kahan, pairwise and other stuff where it doesn't matter the order I take the elements from the matrix, for example:
我正在创建一些函数来执行诸如负数和正数、kahan、pairwise 和其他与我从矩阵中获取元素的顺序无关的“分离和”之类的事情,例如:
template <typename T, int R, int C>
inline T sum(const Eigen::Matrix<T,R,C>& xs)
{
T sumP(0);
T sumN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
{
if (xs(i,j)>0)
sumP += xs(i,j);
else if (xs(i,j)<0) //ignore 0 elements: improvement for sparse matrices I think
sumN += xs(i,j);
}
return sumP+sumN;
}
Now, I would like to make this as efficient as possible, so my question is, would it be better to loop through each column of each row like the above, or do the opposite like the the following:
现在,我想让它尽可能高效,所以我的问题是,像上面那样遍历每一行的每一列会更好,还是像下面那样做相反的事情:
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
(I suppose this depends on the order that the matrix elements are allocated in memory, but I couldn't find this in Eigen's manual).
(我想这取决于矩阵元素在内存中的分配顺序,但我在 Eigen 的手册中找不到它)。
Also, are there other alternate ways like using iterators (do they exist in Eigen?) that might be slightly faster?
此外,是否还有其他替代方法,例如使用迭代器(它们是否存在于 Eigen 中?)可能会稍微快一些?
采纳答案by jleahy
Eigen allocates matrices in column-major (Fortran) order by default (documentation).
默认情况下,Eigen 按列主 (Fortran) 顺序分配矩阵(文档)。
The fastest way to iterate over a matrix is in storage order, doing it the wrong way round will increase the number of cache misses (which if your matrix doesn't fit in L1 will dominate your computation time, so read increase your computation time) by a factor of cacheline/elemsize (probably 64/8=8).
迭代矩阵的最快方法是按存储顺序,以错误的方式进行将增加缓存未命中的数量(如果您的矩阵不适合 L1 将支配您的计算时间,因此请阅读增加您的计算时间)通过缓存线/ elemsize 的因子(可能是 64/8=8)。
If your matrix fits into L1 cache this won't make a difference, but a good compiler should be able to vectorise the loop, which with AVX enabled (on a shiny new core i7) could give you a speedup of as much as 4 times. (256 bits / 64 bits).
如果您的矩阵适合 L1 缓存,这不会有什么不同,但是一个好的编译器应该能够矢量化循环,启用 AVX(在闪亮的新核心 i7 上)可以使您的速度提高多达 4 倍. (256 位/64 位)。
Finally don't expect any of Eigen's built-in functions to give you a speed-up (I don't think there are iterators anyhow, but I may be mistaken), they're just going to give you the same (very simple) code.
最后,不要指望 Eigen 的任何内置函数都能为您提供加速(我认为无论如何都没有迭代器,但我可能会弄错),它们只会给您相同的效果(非常简单) ) 代码。
TLDR: Swap your order of iteration, you need to vary the row index most quickly.
TLDR:交换你的迭代顺序,你需要最快地改变行索引。
回答by random_user
I did some benchmarks to checkout which way is quicker, I got the following results (in seconds):
我做了一些基准测试来检查哪种方式更快,我得到了以下结果(以秒为单位):
12
30
3
6
23
3
The first line is doing iteration as suggested by @jleahy.
The second line is doing iteration as I've done in my code in the question (the inverse order of @jleahy).
The third line is doing iteration using PlainObjectBase::data()
like this for (int i = 0; i < matrixObject.size(); i++)
.
The other 3 lines repeat the same as the above, but with an temporary as suggest by @lucas92
第一行是按照@jleahy 的建议进行迭代。第二行是像我在问题中的代码中所做的那样进行迭代(@jleahy 的逆序)。第三行使用PlainObjectBase::data()
这样的方法进行迭代for (int i = 0; i < matrixObject.size(); i++)
。其他 3 行重复与上述相同,但有@lucas92 建议的临时行
I also did the same tests but using substituting /if else.*/ for /else/ (no special treatment for sparse matrix) and I got the following (in seconds):
我也做了同样的测试,但使用 /if else.*/ 代替 /else/(稀疏矩阵没有特殊处理),我得到以下结果(以秒为单位):
10
27
3
6
24
2
Doing the tests again gave me results pretty similar. I used g++ 4.7.3
with -O3
. The code:
再次进行测试给了我非常相似的结果。我用g++ 4.7.3
用-O3
。编码:
#include <ctime>
#include <iostream>
#include <Eigen/Dense>
using namespace std;
template <typename T, int R, int C>
inline T sum_kahan1(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
{
if (xs(j,i)>0)
{
yP = xs(j,i) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if (xs(j,i)<0)
{
yN = xs(j,i) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan2(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
{
if (xs(i,j)>0)
{
yP = xs(i,j) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if (xs(i,j)<0)
{
yN = xs(i,j) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan3(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, size = xs.size(); i < size; i++)
{
if ((*(xs.data() + i))>0)
{
yP = (*(xs.data() + i)) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if ((*(xs.data() + i))<0)
{
yN = (*(xs.data() + i)) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan1t(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
{
T temporary = xs(j,i);
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if (temporary<0)
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan2t(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
{
T temporary = xs(i,j);
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if (temporary<0)
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan3t(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, size = xs.size(); i < size; i++)
{
T temporary = (*(xs.data() + i));
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else if (temporary<0)
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan1e(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
{
if (xs(j,i)>0)
{
yP = xs(j,i) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = xs(j,i) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan2e(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
{
if (xs(i,j)>0)
{
yP = xs(i,j) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = xs(i,j) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan3e(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, size = xs.size(); i < size; i++)
{
if ((*(xs.data() + i))>0)
{
yP = (*(xs.data() + i)) - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = (*(xs.data() + i)) - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan1te(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
{
T temporary = xs(j,i);
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan2te(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
{
T temporary = xs(i,j);
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
template <typename T, int R, int C>
inline T sum_kahan3te(const Eigen::Matrix<T,R,C>& xs) {
if (xs.size() == 0) return 0;
T sumP(0);
T sumN(0);
T tP(0);
T tN(0);
T cP(0);
T cN(0);
T yP(0);
T yN(0);
for (size_t i = 0, size = xs.size(); i < size; i++)
{
T temporary = (*(xs.data() + i));
if (temporary>0)
{
yP = temporary - cP;
tP = sumP + yP;
cP = (tP - sumP) - yP;
sumP = tP;
}
else
{
yN = temporary - cN;
tN = sumN + yN;
cN = (tN - sumN) - yN;
sumN = tN;
}
}
return sumP+sumN;
}
int main() {
Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic> test = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic>::Random(10000,10000);
cout << "start" << endl;
int now;
now = time(0);
sum_kahan1(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan2(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan3(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan1t(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan2t(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan3t(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan1e(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan2e(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan3e(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan1te(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan2te(test);
cout << time(0) - now << endl;
now = time(0);
sum_kahan3te(test);
cout << time(0) - now << endl;
return 0;
}
回答by Jeff Trull
I notice that the code is equivalent to the sum of all the entries in the matrix, i.e., you could just do this:
我注意到代码相当于矩阵中所有条目的总和,即,您可以这样做:
return xs.sum();
I would assume that would perform better since it's only a single pass, and furthermore Eigen ought to "know" how to arrange the passes for optimum performance.
我认为这会表现得更好,因为它只是一次传球,而且 Eigen 应该“知道”如何安排传球以获得最佳性能。
If, however, you want to retain the two passes, you could express this using the coefficient-wise reduction mechanisms, like this:
但是,如果您想保留两次传递,您可以使用系数减少机制来表达这一点,如下所示:
return (xs.array() > 0).select(xs, 0).sum() +
(xs.array() < 0).select(xs, 0).sum();
which uses the boolean coefficient-wise select to pick the positive and negative entries. I don't know whether it would outperform the hand-rolled loops, but in theory coding it this way allows Eigen (and the compiler) to know more about your intention, and possibly improve the results.
它使用布尔系数明智选择来选择正负条目。我不知道它是否会胜过手动循环,但理论上以这种方式编码可以让 Eigen(和编译器)更多地了解您的意图,并可能改善结果。
回答by lucas92
Try to store xs(i,j) inside a temporary variable inside the loop so you only call the function once.
尝试将 xs(i,j) 存储在循环内的临时变量中,以便您只调用该函数一次。