将 json 转换为 Pandas DataFrame

Question

提问by Yashvardhan Nanavati

I have a JSON file which has multiple objects such as:

我有一个 JSON 文件，其中包含多个对象，例如：

 {"reviewerID": "bc19970fff3383b2fe947cf9a3a5d7b13b6e57ef2cd53abc52bb2dfedf5fb1cd", "asin": "a6ed402934e3c1138111dce09256538afb04c566edf37c16b9ba099d23afb764", "overall": 2.0, "helpful": {"nHelpful": 1, "outOf": 1}, "reviewText": "This remote, for whatever reason, was chosen by Time Warner to replace their previous silver remote, the Time Warner Synergy V RC-U62CP-1.12S.  The actual function of this CLIKR-5 is OK, but the ergonomic design sets back remotes by 20 years.  The buttons are all the same, there's no separation of the number buttons, the volume and channel buttons are the same shape as the other buttons on the remote, and it all adds up to a crappy user experience.  Why would TWC accept this as a replacement?    I'm skipping this and paying double for a refurbished Synergy V.", "summary": "Ergonomic nightmare", "unixReviewTime": 1397433600}

{"reviewerID": "3689286c8658f54a2ff7aa68ce589c81f6cae4c4d9de76fa0f66d5c114f79837", "asin": "8939d791e9dd035aa58da024ace69b20d651cea4adf6159d984872b44f663301", "overall": 4.0, "helpful": {"nHelpful": 21, "outOf": 22}, "reviewText": "This is a great truck GPS. I've tried others and nothing seems to come close to the Rand McNally TND-700.Excellent screen size and resolution. The audio is loud enough to be heard over road noise and the purr of my Kenworth/Cat engine. I've used it for the last 8,000 miles or so and it has only glitched once. Just restarted it and it picked up on my route right where it should have.Clean up the minor issues and this unit rates a solid 5.Rand McNally 528881469 7-inch Intelliroute TND 700 Truck GPS", "summary": "Great Unit!", "unixReviewTime": 1280016000}

I am trying to convert it to a Pandas DataFrame using the following code:

我正在尝试使用以下代码将其转换为 Pandas DataFrame：

train_df = pd.DataFrame()
count = 0;
for l in open('train.json'):
    try:
        count +=1
        if(count==20001):
            break
        obj1 = json.loads(l)
        df1=pd.DataFrame(obj1, index=[0])
        train_df = train_df.append(df1, ignore_index=True)
    except ValueError:
        line = line.replace('\','')
        obj = json.loads(line)
        df1=pd.DataFrame(obj, index=[0])
        train_df = train_df.append(df1, ignore_index=True)

However, it gives me 'NaN' for nested values i.e. 'helpful' attribute. I want the output such that both the keys of the nested attribute are a separate column.

但是，它为嵌套值提供了“NaN”，即“有用”属性。我希望输出使得嵌套属性的两个键都是一个单独的列。

EDIT:

编辑：

P.S: I am using try/except because I have '\' character in some objects which gives me a JSON decode error.

PS：我使用 try/except 是因为我在某些对象中有 '\' 字符，这给了我一个 JSON 解码错误。

Can anyone help? Is there any other approach I can use?

任何人都可以帮忙吗？我可以使用其他任何方法吗？

Thank You.

谢谢你。

Answer 1

采纳答案by Nickil Maveli

Use json_normalizeon the list of dictionaries which performs reasonably faster on large number of json objects.

用于json_normalize在大量 json 对象上执行速度相当快的字典列表。

from pandas.io.json import json_normalize

my_list = []
with open('train.json') as f:
    for line in f:
        line = line.replace('\','')
        my_list.append(json.loads(line))

# avoid transposing if you want to keep keys as columns of the dataframe
result_df = json_normalize(my_list).T

Answer 2

回答by piRSquared

try:

尝试：

pd.concat([pd.Series(json.loads(line)) for line in open('train.json')], axis=1)

将 json 转换为 Pandas DataFrame

提问by Yashvardhan Nanavati

采纳答案by Nickil Maveli

回答by piRSquared

相关推荐

最近更新

标签

将 json 转换为 Pandas DataFrame

提问by Yashvardhan Nanavati

采纳答案by Nickil Maveli

回答by piRSquared

相关推荐

使用带有 zip 压缩的 Pandas read_csv

使用 python/pandas 将月、日、年转换为月、年？

带有排序值的 Pandas 堆积条形图

列上的 Pandas Multiindex Groupby

相关推荐

最近更新

标签