C++ std::bad_alloc 错误

Question

提问by Achintha Gunasekara

I'm working on a C++ program (C++ 98). It reads a text file with lots of lines (10000 lines). These are tab separated values and then I parse it into Vector of Vector objects. However It seems to work for some files (Smaller) but One of my files gives me the following error (this file has 10000 lines and it's 90MB). I'm guessing this is a memory related issue? Can you please help me?

我正在开发一个 C++ 程序（C++ 98）。它读取包含很多行（10000 行）的文本文件。这些是制表符分隔的值，然后我将其解析为 Vector of Vector 对象。然而，它似乎适用于某些文件（较小），但我的一个文件给了我以下错误（此文件有 10000 行，大小为 90MB）。我猜这是一个内存相关的问题？你能帮我么？

Error

错误

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Abort

Code

代码

void AppManager::go(string customerFile) {

    vector<vector<string> > vals = fileReader(customerFile);

    for (unsigned int i = 0; i < vals.size();i++){

        cout << "New One\n\n";

        for (unsigned int j = 0; j < vals[i].size(); j++){

            cout << vals[i][j] << endl;
        }

        cout << "End New One\n\n";
    }
}

vector<vector<string> > AppManager::fileReader(string fileName) {

    string line;
    vector<vector<string> > values;

    ifstream inputFile(fileName.c_str());

    if (inputFile.is_open()) {

        while (getline(inputFile,line)) {

            std::istringstream iss(line);
            std::string val;
            vector<string> tmp;

            while(std::getline(iss, val, '\t')) {

                tmp.push_back(val);
            }

            values.push_back(tmp);
        }

        inputFile.close();
    }
    else {

        throw string("Error reading the file '" + fileName + "'");
    }

    return values;
}

Answer 1

回答by Reinstate Monica

There's nothing wrong with your code, you're simply running on a platform likely with small memory limits, likely an old compiler and likely an old C++ library. It all conspires against you. You'll have to micro-optimize :(

您的代码没有任何问题，您只是在可能具有较小内存限制的平台上运行，可能是旧编译器，也可能是旧 C++ 库。这一切都在密谋反对你。你必须微优化:(

Here's what you can do, starting with lowest hanging fruit first:

这是你可以做的，首先从最低的果实开始：

Do a dry run through the file, just counting the lines. Then values.resize(numberOfLines), seek to the beginning and only then read the values. Of course you won't be using values.push_backanymore, merely values[lineNumber] = tmp. Resizing the valuesvector as you add to it may more than double the amount of memory your process needs on a temporary basis.
At the end of the line, do tmp.resize(tmp.size()- it'll shrink the vector to just fit the data.
You can reduce overheads in the existing code by storing all the values in one vector.
1. If each line has a different number of elements, but you access them sequentially later on, you can store an empty string as an internal delimiter, it may have lower overhead than the vector.
2. If each line has same number of values, then splitting them by lines adds unnecessary overhead - you know the index of the first value in each line, it's simply lineNumber * valuesPerLine, where first line has number 0.
Memory-map the file. Store the beginning and end of each word in a structure element of a vector, perhaps with a line number as well if you need it split up in lines.

对文件进行试运行，只计算行数。然后values.resize(numberOfLines)，寻找开始，然后才读取值。当然你不会再使用values.push_back了，只是values[lineNumber] = tmp. 在values添加向量时调整向量的大小可能会使进程临时需要的内存量增加一倍以上。
在该行的末尾，执行tmp.resize(tmp.size()- 它将缩小向量以适合数据。
您可以通过将所有值存储在一个向量中来减少现有代码的开销。
1. 如果每一行有不同数量的元素，但稍后按顺序访问它们，则可以存储一个空字符串作为内部分隔符，它的开销可能低于向量。
2. 如果每行具有相同数量的值，那么按行拆分它们会增加不必要的开销 - 您知道每行中第一个值的索引，它只是lineNumber * valuesPerLine，其中第一行有 number 0。
内存映射文件。将每个单词的开头和结尾存储在向量的结构元素中，如果您需要将其分成几行，也可以使用行号。

C++ std::bad_alloc 错误

提问by Achintha Gunasekara

Error

错误

Code

代码

回答by Reinstate Monica

相关推荐

最近更新

标签

C++ std::bad_alloc 错误

提问by Achintha Gunasekara

Error

错误

Code

代码

回答by Reinstate Monica

相关推荐

C++ 网格系统中的寻路

C++ 将节点添加到链表的前面

C++ 静态变量和常量变量有什么区别？

C++ 无法理解带有两个变量的 for 循环

相关推荐

最近更新

标签