Python 如何将 CSV 文件转换为多行 JSON?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19697846/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert CSV file to multiline JSON?
提问by BeanBagKing
Here's my code, really simple stuff...
这是我的代码,非常简单的东西......
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)
Declare some field names, the reader uses CSV to read the file, and the filed names to dump the file to a JSON format. Here's the problem...
声明一些字段名称,读取器使用 CSV 读取文件,并使用字段名称将文件转储为 JSON 格式。问题来了……
Each record in the CSV file is on a different row. I want the JSON output to be the same way. The problem is it dumps it all on one giant, long line.
CSV 文件中的每条记录都位于不同的行上。我希望 JSON 输出是相同的方式。问题是它把它全部倾倒在一条巨大的长线上。
I've tried using something like for line in csvfile:
and then running my code below that with reader = csv.DictReader( line, fieldnames)
which loops through each line, but it does the entire file on one line, then loops through the entire file on another line... continues until it runs out of lines.
我试过使用类似的东西for line in csvfile:
,然后在下面运行我的代码,reader = csv.DictReader( line, fieldnames)
它循环遍历每一行,但它在一行上执行整个文件,然后在另一行上循环整个文件......继续直到它用完行.
Any suggestions for correcting this?
有什么建议可以纠正这个问题吗?
Edit: To clarify, currently I have: (every record on line 1)
编辑:澄清一下,目前我有:(第 1 行的每条记录)
[{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"},{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}]
What I'm looking for: (2 records on 2 lines)
我在找什么:(2 行 2 条记录)
{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"}
{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}
Not each individual field indented/on a separate line, but each record on it's own line.
不是每个单独的字段都缩进/在单独的行上,而是每个记录都在它自己的行上。
Some sample input.
一些示例输入。
"John","Doe","001","Message1"
"George","Washington","002","Message2"
采纳答案by SingleNegationElimination
The problem with your desired output is that it is not valid json document,; it's a stream of json documents!
您想要的输出的问题是它不是有效的 json 文档;这是一个json文档流!
That's okay, if its what you need, but that means that for each document you want in your output, you'll have to call json.dumps
.
没关系,如果它是您需要的,但这意味着对于您想要在输出中的每个文档,您必须调用json.dumps
.
Since the newline you want separating your documents is not contained in those documents, you're on the hook for supplying it yourself. So we just need to pull the loop out of the call to json.dump and interpose newlines for each document written.
由于您想要分隔文档的换行符不包含在这些文档中,因此您需要自己提供换行符。所以我们只需要将循环从对 json.dump 的调用中拉出来,并为每个写入的文档插入换行符。
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
回答by Wayne Werner
Add the indent
parameter to json.dumps
将indent
参数添加到json.dumps
data = {'this': ['has', 'some', 'things'],
'in': {'it': 'with', 'some': 'more'}}
print(json.dumps(data, indent=4))
Also note that, you can simply use json.dump
with the open jsonfile
:
另请注意,您可以简单地使用json.dump
open jsonfile
:
json.dump(data, jsonfile)
回答by MONTYHS
import csv
import json
csvfile = csv.DictReader('filename.csv', 'r'))
output =[]
for each in csvfile:
row ={}
row['FirstName'] = each['FirstName']
row['LastName'] = each['LastName']
row['IDNumber'] = each ['IDNumber']
row['Message'] = each['Message']
output.append(row)
json.dump(output,open('filename.json','w'),indent=4,sort_keys=False)
回答by GarciadelCastillo
As slight improvement to @MONTYHS answer, iterating through a tup of fieldnames:
作为对@MONTYHS 答案的轻微改进,遍历一组字段名:
import csv
import json
csvfilename = 'filename.csv'
jsonfilename = csvfilename.split('.')[0] + '.json'
csvfile = open(csvfilename, 'r')
jsonfile = open(jsonfilename, 'w')
reader = csv.DictReader(csvfile)
fieldnames = ('FirstName', 'LastName', 'IDNumber', 'Message')
output = []
for each in reader:
row = {}
for field in fieldnames:
row[field] = each[field]
output.append(row)
json.dump(output, jsonfile, indent=2, sort_keys=True)
回答by Snork S
You can try this
你可以试试这个
import csvmapper
# how does the object look
mapper = csvmapper.DictMapper([
[
{ 'name' : 'FirstName'},
{ 'name' : 'LastName' },
{ 'name' : 'IDNumber', 'type':'int' },
{ 'name' : 'Messages' }
]
])
# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
# conversion service
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)
Edit:
编辑:
Simpler approach
更简单的方法
import csvmapper
fields = ('FirstName', 'LastName', 'IDNumber', 'Messages')
parser = CSVParser('sample.csv', csvmapper.FieldMapper(fields))
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)
回答by Lawrence I. Siden
I took @SingleNegationElimination's response and simplified it into a three-liner that can be used in a pipeline:
我采用了@SingleNegationElimination 的响应并将其简化为可在管道中使用的三行:
import csv
import json
import sys
for row in csv.DictReader(sys.stdin):
json.dump(row, sys.stdout)
sys.stdout.write('\n')
回答by impiyush
How about using Pandas to read the csv file into a DataFrame (pd.read_csv), then manipulating the columns if you want (dropping them or updating values) and finally converting the DataFrame back to JSON (pd.DataFrame.to_json).
如何使用 Pandas 将 csv 文件读入 DataFrame ( pd.read_csv),然后根据需要操作列(删除它们或更新值),最后将 DataFrame 转换回 JSON ( pd.DataFrame.to_json)。
Note:I haven't checked how efficient this will be but this is definitely one of the easiest ways to manipulate and convert a large csv to json.
注意:我还没有检查这会有多高效,但这绝对是操作大型 csv 并将其转换为 json 的最简单方法之一。
回答by Mark Channing
I see this is old but I needed the code from SingleNegationElimination however I had issue with the data containing non utf-8 characters. These appeared in fields I was not overly concerned with so I chose to ignore them. However that took some effort. I am new to python so with some trial and error I got it to work. The code is a copy of SingleNegationElimination with the extra handling of utf-8. I tried to do it with https://docs.python.org/2.7/library/csv.htmlbut in the end gave up. The below code worked.
我看到这是旧的,但我需要来自 SingleNegationElimination 的代码,但是我对包含非 utf-8 字符的数据有问题。这些出现在我不太关心的领域,所以我选择忽略它们。然而,这需要一些努力。我是 python 的新手,所以经过一些试验和错误,我让它工作了。该代码是 SingleNegationElimination 的副本,带有 utf-8 的额外处理。我试图用https://docs.python.org/2.7/library/csv.html来做,但最后放弃了。下面的代码有效。
import csv, json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("Scope","Comment","OOS Code","In RMF","Code","Status","Name","Sub Code","CAT","LOB","Description","Owner","Manager","Platform Owner")
reader = csv.DictReader(csvfile , fieldnames)
code = ''
for row in reader:
try:
print('+' + row['Code'])
for key in row:
row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
json.dump(row, jsonfile)
jsonfile.write('\n')
except:
print('-' + row['Code'])
raise
回答by Naufal
You can use Pandas DataFrame to achieve this, with the following Example:
您可以使用 Pandas DataFrame 来实现这一点,示例如下:
import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("path/to/file.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("/path/to/new/file.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)
回答by Laxman
import csv
import json
file = 'csv_file_name.csv'
json_file = 'output_file_name.json'
#Read CSV File
def read_CSV(file, json_file):
csv_rows = []
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
field = reader.fieldnames
for row in reader:
csv_rows.extend([{field[i]:row[field[i]] for i in range(len(field))}])
convert_write_json(csv_rows, json_file)
#Convert csv data into json
def convert_write_json(data, json_file):
with open(json_file, "w") as f:
f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty
f.write(json.dumps(data))
read_CSV(file,json_file)