如何在 C++ 中读取和操作 CSV 文件数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/415515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I read and manipulate CSV file data in C++?
提问by zkwentz
Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.
不言自明,我试过谷歌并得到了很多可怕的专家交流,我也在这里搜索无济于事。最好是在线教程或示例。谢谢你们。
采纳答案by Tom
If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:
如果您真正在做的是操作 CSV 文件本身,那么 Nelson 的回答是有道理的。但是,我怀疑 CSV 只是您正在解决的问题的产物。在 C++ 中,这可能意味着你有这样的数据模型:
struct Customer {
int id;
std::string first_name;
std::string last_name;
struct {
std::string street;
std::string unit;
} address;
char state[2];
int zip;
};
Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer>
or std::set<Customer>
.
因此,当您处理一组数据时,使用std::vector<Customer>
或是有意义的std::set<Customer>
。
With that in mind, think of your CSV handling as two operations:
考虑到这一点,将您的 CSV 处理视为两个操作:
// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
CSVReader(const std::string &inputFile);
bool hasNextLine();
void readNextLine(std::vector<std::string> &fields);
private:
/* secrets */
};
class CSVWriter {
public:
CSVWriter(const std::string &outputFile);
void writeNextLine(const std::vector<std::string> &fields);
private:
/* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);
Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:
一次读取和写入一行,而不是保留文件本身的完整内存表示。有几个明显的好处:
- Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
- You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML
<table>
rendering. - Your memory footprint is likely to be smaller (depends on relative
sizeof(Customer)
vs. the number of bytes in a single row). CSVReader
andCSVWriter
can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
- 您的数据以对您的问题(客户)有意义的形式表示,而不是当前的解决方案(CSV 文件)。
- 您可以轻松地为其他数据格式添加适配器,例如批量 SQL 导入/导出、Excel/OO 电子表格文件,甚至 HTML
<table>
渲染。 - 您的内存占用可能更小(取决于相对
sizeof(Customer)
与单行中的字节数)。 CSVReader
并且CSVWriter
可以重用作为内存模型(如 Nelson 模型)的基础,而不会损失性能或功能。反过来说是不对的。
回答by Martin York
More information would be useful.
更多信息会很有用。
But the simplest form:
但最简单的形式:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
另请参阅此问题:C++ 中的 CSV parser
回答by Alessandro Jacopson
You can try the Boost Tokenizer library, in particular the Escaped List Separator
您可以尝试使用 Boost Tokenizer 库,尤其是Escaped List Separator
回答by Marc Bernier
I've worked with a lot of CSV files in my time. I'd like to add the advice:
我曾经处理过很多 CSV 文件。我想补充一下建议:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
1 - 根据来源(Excel 等),逗号或制表符可能嵌入到字段中。通常,规则是它们将受到“保护”,因为该字段将用双引号分隔,如“Boston, MA 02346”。
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
2 - 某些来源不会用双引号分隔所有文本字段。其他来源会。其他人将分隔所有字段,甚至数字。
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
3 - 包含双引号的字段通常会将嵌入的双引号加倍(并且字段本身用双引号分隔,如 "George ""Babe"" Ruth"。
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
4 - 某些来源将嵌入 CR/LF(Excel 就是其中之一!)。有时它只是一个 CR。该字段通常会被双引号分隔,但这种情况很难处理。
回答by Marc Bernier
This is a good exercise for yourself to work on :)
这对你自己来说是一个很好的练习:)
You should break your library into three parts
你应该把你的图书馆分成三个部分
- Loading the CSV file
- Representing the file in memory so that you can modify it and read it
- Saving the CSV file back to disk
- 加载 CSV 文件
- 表示内存中的文件,以便您可以对其进行修改和读取
- 将 CSV 文件保存回磁盘
So you are looking at writing a CSVDocument class that contains:
因此,您正在考虑编写一个包含以下内容的 CSVDocument 类:
- Load(const char* file);
- Save(const char* file);
- GetBody
- 加载(常量字符*文件);
- 保存(常量字符*文件);
- 获取身体
So that you may use your library like this:
这样你就可以像这样使用你的图书馆:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
这为我们提供了以下接口:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
现在你只需要从这里填写空白:)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
相信我,当我这么说时——把你的时间花在学习如何制作库上,尤其是那些处理数据加载、操作和保存的库,不仅会消除你对这些库存在的依赖,还会让你成为一个全——围绕更好的程序员。
:)
:)
EDIT
编辑
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
我不知道你对字符串操作和解析已经了解多少;因此,如果您遇到困难,我很乐意提供帮助。
回答by Marc Bernier
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
这是您可以使用的一些代码。来自 csv 的数据存储在一个行数组中。每一行都是一个字符串数组。希望这可以帮助。
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
回答by Jonathan Leffler
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
看看Kernighan & Pike的“编程实践”(TPOP)。它包括一个在 C 和 C++ 中解析 CSV 文件的示例。但是,即使您不使用代码,这本书也值得一读。
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
(上一个网址:http: //cm.bell-labs.com/cm/cs/tpop/)
回答by stefanB
Using boost tokenizer to parse records, see here for more details.
使用 boost tokenizer 来解析记录,请参阅此处了解更多详细信息。
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
回答by Kevin P.
I found this interesting approach:
我发现了这个有趣的方法:
Quote: CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
引用: CSVtoC 是一个程序,它将 CSV 或逗号分隔值文件作为输入并将其转储为 C 结构。
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.
当然,您不能对 CSV 文件进行更改,但如果您只需要在内存中对数据进行只读访问,它就可以工作。