pandas 将一列json字符串转换为一列数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50656469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert a column of json strings into columns of data
提问by PRATHAMESH
I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data
我有一个大约 30000 行的大数据框和一个包含 json 字符串的单列。每个 json 字符串包含许多变量及其值 我想将此 json 字符串分解为数据列
two rows looks like
两行看起来像
0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}
I want to convert this into a dataframe like
我想把它转换成一个像
a b c
1 2 3
4 5 6
Please help
请帮忙
回答by akilat90
Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Methodif that isn't the case)
您的列值似乎在实际 json 字符串之前有一个额外的数字。所以你可能想先把它去掉(如果不是这种情况,请跳到方法)
One way to do that is to apply a function to the column
一种方法是将函数应用于列
# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])
# print(df)
json
# 0 0 {"a":"1","b":"2","c":"3"}
# 1 1 {"a" :"4","b":"5","c":"6"}
# function to remove the number
import re
def split_num(val):
p = re.compile("({.*)")
return p.search(val).group(1)
# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)
# json
# 0 {"a":"1","b":"2","c":"3"}
# 1 {"a" :"4","b":"5","c":"6"}
Method:
方法:
Once the df
is in the above format, the below will convert each row entry to a dictionary:
一旦采用df
上述格式,以下内容会将每一行条目转换为字典:
df['json'] = df['json'].map(lambda x: dict(eval(x)))
Then, applying pd.Series
to the column will do the job
然后,申请pd.Series
该专栏将完成这项工作
d = df['json'].apply(pd.Series)
print(d)
# a b c
# 0 1 2 3
# 1 4 5 6
回答by kingdion
If you are using dataframes in pandas, you can use one of library functions known as from_dictwhich creates a dataframe from a dictionary.
如果您在Pandas中使用数据帧,您可以使用称为from_dict的库函数之一,它从字典创建数据帧。
If your data is json, you can convert that into a dict quite easily using the json library.
如果您的数据是 json,则可以使用 json 库轻松将其转换为 dict。
import json
import pandas
my_dict = json.loads({"a" ;"4","b":"5","c":"6"})
pandas.DataFrame.from_dict(my_dict)
You can apply this logic to your rows.
您可以将此逻辑应用于您的行。
回答by nimrodz
with open(json_file) as f:
df = pd.DataFrame(json.loads(line) for line in f)