pandas 将一列json字符串转换为一列数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50656469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:38:12  来源:igfitidea点击:

Convert a column of json strings into columns of data

pythonjsonpandasdataframe

提问by PRATHAMESH

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data

我有一个大约 30000 行的大数据框和一个包含 json 字符串的单列。每个 json 字符串包含许多变量及其值 我想将此 json 字符串分解为数据列

two rows looks like

两行看起来像

0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}

I want to convert this into a dataframe like

我想把它转换成一个像

a   b   c
1   2   3
4   5   6

Please help

请帮忙

回答by akilat90

Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Methodif that isn't the case)

您的列值似乎在实际 json 字符串之前有一个额外的数字。所以你可能想先把它去掉(如果不是这种情况,请跳到方法

One way to do that is to apply a function to the column

一种方法是将函数应用于列

# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])

# print(df)
                         json
# 0  0 {"a":"1","b":"2","c":"3"}
# 1  1 {"a" :"4","b":"5","c":"6"}

# function to remove the number
import re

def split_num(val):
    p = re.compile("({.*)")
    return p.search(val).group(1)

# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)

#                          json
# 0   {"a":"1","b":"2","c":"3"}
# 1  {"a" :"4","b":"5","c":"6"}


Method:

方法:

Once the dfis in the above format, the below will convert each row entry to a dictionary:

一旦采用df上述格式,以下内容会将每一行条目转换为字典:

df['json'] = df['json'].map(lambda x: dict(eval(x)))

Then, applying pd.Seriesto the column will do the job

然后,申请pd.Series该专栏将完成这项工作

d = df['json'].apply(pd.Series)
print(d)
#   a  b  c
# 0  1  2  3
# 1  4  5  6

回答by kingdion

If you are using dataframes in pandas, you can use one of library functions known as from_dictwhich creates a dataframe from a dictionary.

如果您在Pandas中使用数据帧,您可以使用称为from_dict的库函数之一,它从字典创建数据帧。

If your data is json, you can convert that into a dict quite easily using the json library.

如果您的数据是 json,则可以使用 json 库轻松将其转换为 dict。

import json
import pandas 

my_dict = json.loads({"a" ;"4","b":"5","c":"6"})
pandas.DataFrame.from_dict(my_dict)

You can apply this logic to your rows.

您可以将此逻辑应用于您的行。

回答by nimrodz

with open(json_file) as f:
    df = pd.DataFrame(json.loads(line) for line in f)