javascript 将大型 CSV 文件转换为 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18759516/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert large CSV files to JSON
提问by JVG
I don't mind if this is done with a separate program, with Excel, in NodeJS or in a web app.
我不介意这是否是通过单独的程序、Excel、NodeJS 或 Web 应用程序完成的。
It's exactly the same problem as described here:
这与此处描述的问题完全相同:
Large CSV to JSON/Object in Node.js
It seems that the OP didn't get that answer to work (yet accepted it anyway?). I've tried working with it but can't seem to get it to work either.
似乎 OP 没有得到工作的答案(无论如何还是接受了?)。我试过使用它,但似乎也无法让它工作。
In short: I'm working with a ~50,000 row CSV and I want to convert it to JSON. I've tried just about every online "csv to json" webapp out there, all crash with this large of a dataset.
简而言之:我正在使用约 50,000 行的 CSV,我想将其转换为 JSON。我已经尝试过几乎所有在线“csv 到 json”web 应用程序,都因如此庞大的数据集而崩溃。
I've tried many Node
CSV to JSON modules but, again, they all crash. The csvtojson
module seemed promising, but I got this error: FATAL ERROR: JS Allocation failed - process out of memory
.
我已经尝试了许多Node
CSV 到 JSON 模块,但同样,它们都崩溃了。该csvtojson
模块看起来很有前途,但我收到了这个错误:FATAL ERROR: JS Allocation failed - process out of memory
.
What on earth can I do to get this data in a useable format? As above, I don't mind if it's an application, something that works within Excel, a webapp or a Node module, so long as I either get a .JSON
file or an object that I can work with within Node.
我到底能做些什么才能以可用的格式获取这些数据?如上所述,我不介意它是一个应用程序,可以在 Excel、webapp 或 Node 模块中运行的东西,只要我得到一个.JSON
文件或一个可以在 Node.js 中使用的对象。
Any ideas?
有任何想法吗?
回答by Keyang
You mentioned csvtojson module above and that is an open source project which I am maintaining.
您在上面提到了 csvtojson 模块,这是我正在维护的一个开源项目。
I am sorry it did not work out for you and it was caused by a bug solved several months ago. I also added some extra lines in README for your scenario. Please check out Process Big CSV File in Command Line.
很抱歉它没有为您解决,它是由几个月前解决的错误引起的。我还在 README 中为您的场景添加了一些额外的行。请查看在命令行中处理大 CSV 文件。
Please make sure you have the latest csvtojson release. (Currently it is 0.2.2)
请确保您拥有最新的 csvtojson 版本。(目前是 0.2.2)
You can update it by running
您可以通过运行更新它
npm install -g csvtojson
After you've installed latest csvtojson, you just need to run:
你以后安装最新csvtojson,你只需要运行:
csvtojson [path to bigcsvdata] > converted.json
This streams data from the csvfile. Or if you want to stream data from another application:
这会从 csvfile 流式传输数据。或者,如果您想从另一个应用程序流式传输数据:
cat [path to bigcsvdata] | csvtojson > converted.json
They will output the same thing.
他们会输出同样的东西。
I have manually tested it with a csv file over 3 million records and it works without an issue.
我已经使用超过 300 万条记录的 csv 文件对其进行了手动测试,并且可以正常工作。
I believe you just need a simple tool. The purpose of the lib is to relief stress like this. Please do let me know if you meet any problems next time so I could solve it in time.
我相信你只需要一个简单的工具。lib 的目的是像这样缓解压力。如果您下次遇到任何问题,请告诉我,以便我及时解决。
回答by Paul Mougel
The npm csvpackage is able to process a CSV stream, without having to store the full file in memory. You'll need to install node.js and csv (npm install csv
). Here is a sample application, which will write JSON objects to a file:
npm csv包能够处理 CSV 流,而无需将完整文件存储在内存中。您需要安装 node.js 和 csv ( npm install csv
)。这是一个示例应用程序,它将 JSON 对象写入文件:
var csv = require('csv')
var fs = require('fs')
var f = fs.createReadStream('Fielding.csv')
var w = fs.createWriteStream('out.txt')
w.write('[');
csv()
.from.stream(f, {columns:true})
.transform(function(row, index) {
return (index === 0 ? '' : ',\n') + JSON.stringify(row);
})
.to.stream(w, {columns: true, end: false})
.on('end', function() {
w.write(']');
w.end();
});
Please note the columns
options, needed to keep the columns name in the JSON objects (otherwise you'll get a simple array) and the end
options set to false
, which tells node not to close the file stream when the CSV stream closes: this allows us to add the last ']'. The transform
callback provides a way for your program to hook into the data stream and transform the data before it is written to the next stream.
请注意将columns
列名称保留在 JSON 对象中所需的选项(否则您将获得一个简单的数组)和end
设置为的选项false
,它告诉节点在 CSV 流关闭时不要关闭文件流:这允许我们添加最后一个']'。该transform
回调提供程序挂接到数据流,然后将其写入下一个数据流转换数据的方式。
回答by moka
When you work with such large dataset, you need to write streamed processing rather than load > convert > save. As loading such big thing - would not fit the memory.
当您使用如此大的数据集时,您需要编写流式处理而不是加载 > 转换 > 保存。由于加载这么大的东西 - 不适合内存。
CSV file it self is very simple and has little differences over formats. So you can write simple parser yourself. As well JSON is usually simple as well, and can be easily processed line by line without need of loading whole thing.
CSV 文件本身非常简单,并且在格式上几乎没有区别。所以你可以自己编写简单的解析器。同样,JSON 通常也很简单,可以轻松地逐行处理,而无需加载整个内容。
- createReadStreamfrom CSV file.
- createWriteStream for new JSON file.
on('data', ...)
process read data: append to general string, and extract full lines if available.- On the way if line/lines available from readStream, convert them to JSON objects and push into writeStream of new JSON file.
- createReadStream从 CSV 文件。
- createWriteStream 用于新的 JSON 文件。
on('data', ...)
处理读取数据:附加到一般字符串,并在可用时提取整行。- 在途中,如果可以从 readStream 获得一行/几行,将它们转换为 JSON 对象并推送到新 JSON 文件的 writeStream 中。
This is well doable with pipe
and own pipe in the middle that will convert lines into objects to be written into new file.
这很可行,pipe
中间有自己的管道,可以将行转换为要写入新文件的对象。
This approach will allow to avoid loading the whole file into memory, but process it gradually with load part, process and write it and go forward slowly.
这种方法将允许避免将整个文件加载到内存中,而是通过加载部分逐渐处理它,处理并写入它并缓慢前进。
回答by Estev?o Lucas
You can try use OpenRefine (or Google Refine).
您可以尝试使用 OpenRefine(或 Google Refine)。
Import your CSV file. Then you can export. Edit template for a JSON format.
导入您的 CSV 文件。然后就可以导出了。编辑 JSON 格式的模板。
http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/
http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/
回答by Bogdan
This should do the job.
这应该可以完成工作。
npm i --save csv2json fs-extra // install the modules
const csv2json = require('csv2json');
const fs = require('fs-extra');
const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
source
.pipe(csv2json())
.pipe(output );
回答by ikmahesh
- Use python CLI
- 使用 python CLI
converts all csv
files in a folder to json
file, no \n\r
将csv
文件夹中的所有文件转换为json
文件,没有\n\r
import json
import csv
for x in range(1, 11):
f = open('9447440523-Huge'+str(x)+'.csv', 'r')
reader = csv.DictReader(f)
i=0;
jsonoutput = str(x)+'.json'
with open(jsonoutput, 'a') as f:
f.write('[')
for x in reader:
json.dump(x, f)
f.write(',')
f.write(']')