javascript Node.js 中的大型 CSV 到 JSON/对象
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16617532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Large CSV to JSON/Object in Node.js
提问by neverfox
I am trying to do something that seems like it should not only be fairly simple to accomplish but a common enough task that there would be straightforward packages available to do it. I wish to take a large CSV file (an export from a relational database table) and convert it to an array of JavaScript objects. Furthermore, I would like to export it to a .json
file fixture.
我正在尝试做一些看起来不仅应该很容易完成的事情,而且是一项足够常见的任务,可以使用简单的软件包来完成它。我希望获取一个大型 CSV 文件(从关系数据库表导出)并将其转换为 JavaScript 对象数组。此外,我想将其导出到.json
文件夹具。
Example CSV:
示例 CSV:
a,b,c,d
1,2,3,4
5,6,7,8
...
Desired JSON:
所需的 JSON:
[
{"a": 1,"b": 2,"c": 3,"d": 4},
{"a": 5,"b": 6,"c": 7,"d": 8},
...
]
I've tried several node CSV parsers, streamers, self-proclaimed CSV-to-JSON libraries, but I can't seem to get the result I want, or if I can it only works if the files are smaller. My file is nearly 1 GB in size with ~40m rows (which would create 40m objects). I expect that it would require streaming the input and/or output to avoid memory problems.
我已经尝试了几个节点 CSV 解析器、流媒体、自称为 CSV 到 JSON 的库,但我似乎无法得到我想要的结果,或者如果我只能在文件较小的情况下工作。我的文件大小接近 1 GB,大约有 40m 行(这将创建 40m 个对象)。我希望它需要流式传输输入和/或输出以避免内存问题。
Here are the packages I've tried:
以下是我尝试过的软件包:
- https://github.com/klaemo/csv-stream
- https://github.com/koles/ya-csv
- https://github.com/davidgtonge/stream-convert(works but it so exceedingly slow as to be useless, since I alter the dataset often. It took nearly 3 hours to parse a 60 MB csv file)
- https://github.com/cgiffard/CSVtoJSON.js
- https://github.com/wdavidw/node-csv-parser(doesn't seem to be designed for converting csv to other formats)
- https://github.com/voodootikigod/node-csv
- https://github.com/klaemo/csv-stream
- https://github.com/koles/ya-csv
- https://github.com/davidgtonge/stream-convert(工作但速度太慢以至于没用,因为我经常更改数据集。解析一个 60 MB 的 csv 文件花了将近 3 个小时)
- https://github.com/cgiffard/CSVtoJSON.js
- https://github.com/wdavidw/node-csv-parser(似乎不是为将 csv 转换为其他格式而设计的)
- https://github.com/voodootikigod/node-csv
I'm using Node 0.10.6 and would like a recommendation on how to easily accomplish this. Rolling my own might be best but I'm not sure where to begin with all of Node's streaming features, especially since they changed the API in 0.10.x.
我正在使用 Node 0.10.6 并希望获得有关如何轻松完成此操作的建议。滚动我自己的可能是最好的,但我不确定从哪里开始使用 Node 的所有流功能,特别是因为它们在 0.10.x 中更改了 API。
采纳答案by Myrne Stol
While this is far from a complete answer, you may be able to base your solution on https://github.com/dominictarr/event-stream. Adapted example from the readme:
虽然这远不是一个完整的答案,但您可以将您的解决方案基于https://github.com/dominictarr/event-stream。改编自自述文件的示例:
var es = require('event-stream')
es.pipeline( //connect streams together with `pipe`
process.openStdin(), //open stdin
es.split(), //split stream to break on newlines
es.map(function (data, callback) { //turn this async function into a stream
callback(null
, JSON.stringify(parseCSVLine(data))) // deal with one line of CSV data
}),
process.stdout
)
After that, I expect you have a bunch of stringified JSON objects on each line.
This then needs to be converted to an array, which you may be able to do with and appending ,
to end of every line, removing it on the last, and then adding [
and ]
to beginning and end of the file.
在那之后,我希望你在每一行上都有一堆字符串化的 JSON 对象。然后需要将其转换为数组,您可以使用该数组并将其附加,
到每一行的末尾,在最后删除它,然后将[
和添加]
到文件的开头和结尾。
parseCSVLine
function must be configured to assign the CSV values to the right object properties. This can be fairly easily done after passing the first line of the file.
parseCSVLine
函数必须配置为将 CSV 值分配给正确的对象属性。在传递文件的第一行后,这可以很容易地完成。
I do notice the library is not tested on 0.10 (at least not with Travis), so beware. Maybe run npm test
on the source yourself.
我注意到这个库没有在 0.10 上测试(至少没有在 Travis 上测试),所以要小心。也许npm test
自己在源代码上运行。
回答by Keyang
Check node.js csvtojson module which can be used as a library, command line tools, or web server plugin. https://www.npmjs.org/package/csvtojson. the source code can be found at: https://github.com/Keyang/node-csvtojson
检查可用作库、命令行工具或 Web 服务器插件的 node.js csvtojson 模块。https://www.npmjs.org/package/csvtojson。源代码可以在:https: //github.com/Keyang/node-csvtojson
or install from NPM repo:
或从 NPM 存储库安装:
npm install -g csvtojson
It supports any size csv data / field type / nested json etc. A bunch of features.
它支持任何大小的 csv 数据/字段类型/嵌套 json 等。一堆功能。
Example
例子
var Converter=require("csvtojson").core.Converter;
var csvConverter=new Converter({constructResult:false, toArrayString:true}); // The constructResult parameter=false will turn off final result construction in memory for stream feature. toArrayString will stream out a normal JSON array object.
var readStream=require("fs").createReadStream("inputData.csv");
var writeStream=require("fs").createWriteStream("outpuData.json");
readStream.pipe(csvConverter).pipe(writeStream);
You can also use it as a cli tool:
您还可以将其用作 cli 工具:
csvtojson myCSVFile.csv
回答by morz
I found something more easier way to read csv data using csvtojson.
我找到了使用 csvtojson 读取 csv 数据的更简单方法。
Here's the code:
这是代码:
var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){
var csvData = JSON.stringify
([
{resultdata : result[0]},
{resultdata : result[1]},
{resultdata : result[2]},
{resultdata : result[3]},
{resultdata : result[4]}
]);
csvData = JSON.parse(csvData);
console.log(csvData);
});
or you can easily do this:
或者你可以很容易地做到这一点:
var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){
console.log(result);
});
Here's the result from the 1st code:
这是第一个代码的结果:
[ { resultdata:
{ 'Header 1': 'A_1',
'Header 2': 'B_1',
'Header 3': 'C_1',
'Header 4': 'D_1',
'Header 5': 'E_1' } },
{ resultdata:
{ 'Header 1': 'A_2',
'Header 2': 'B_2',
'Header 3': 'C_2',
'Header 4': 'D_2',
'Header 5': 'E_2' } },
{ resultdata:
{ 'Header 1': 'A_3',
'Header 2': 'B_3',
'Header 3': 'C_3',
'Header 4': 'D_3',
'Header 5': 'E_3' } },
{ resultdata:
{ 'Header 1': 'A_4',
'Header 2': 'B_4',
'Header 3': 'C_4',
'Header 4': 'D_4',
'Header 5': 'E_4' } },
{ resultdata:
{ 'Header 1': 'A_5',
'Header 2': 'B_5',
'Header 3': 'C_5',
'Header 4': 'D_5',
'Header 5': 'E_5' } } ]
Source of this code is found in: https://www.npmjs.com/package/csvtojson#installation
此代码的来源位于:https: //www.npmjs.com/package/csvtojson#installation
I hope you got some idea.
我希望你有一些想法。
回答by HaNdTriX
I recommend implementing the logic yourself. Node.js is actually pretty good at these kinds of tasks.
我建议自己实现逻辑。Node.js 实际上非常擅长这些类型的任务。
The following solution is using streams since they won't blow up your memory.
以下解决方案使用流,因为它们不会破坏您的记忆。
Install Dependencies
安装依赖
npm install through2 split2 --save
Code
代码
import through2 from 'through2'
import split2 from 'split2'
fs.createReadStream('<yourFilePath>')
// Read line by line
.pipe(split2())
// Parse CSV line
.pipe(parseCSV())
// Process your Records
.pipe(processRecord())
const parseCSV = () => {
let templateKeys = []
let parseHeadline = true
return through2.obj((data, enc, cb) => {
if (parseHeadline) {
templateKeys = data
.toString()
.split(';')
parseHeadline = false
return cb(null, null)
}
const entries = data
.toString()
.split(';')
const obj = {}
templateKeys.forEach((el, index) => {
obj[el] = entries[index]
})
return cb(null, obj)
})
}
const processRecord = () => {
return through2.obj(function (data, enc, cb) {
// Implement your own processing
// logic here e.g.:
MyDB
.insert(data)
.then(() => cb())
.catch(cb)
})
}
For more infos about this topic visit Stefan Baumgartners excellent tutorialon this topic.
有关此主题的更多信息,请访问 Stefan Baumgartner 关于此主题的优秀教程。
回答by Bogdan
You can use streams so that you ca process big files. Here is what you need to do. This should work just fine.
您可以使用流来处理大文件。这是您需要做的。这应该可以正常工作。
npm i --save csv2json fs-extra // install the modules
const csv2json = require('csv2json');
const fs = require('fs-extra');
const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
source
.pipe(csv2json())
.pipe(output );
回答by Micha? Kapracki
Hmm... lots of solutions, I'll add one more with scramjet
:
嗯......很多解决方案,我会添加一个scramjet
:
$ npm install --save scramjet
And then
接着
process.stdin.pipe(
new (require("scramjet").StringStream)("utf-8")
)
.CSVParse()
.toJSONArray()
.pipe(process.stdout)
This will result in exactly what you described in a streamed way.
这将导致您以流式方式准确描述。