pandas 将一列json字符串转换为一列数据

Question

提问by PRATHAMESH

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data

我有一个大约 30000 行的大数据框和一个包含 json 字符串的单列。每个 json 字符串包含许多变量及其值我想将此 json 字符串分解为数据列

two rows looks like

两行看起来像

0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}

I want to convert this into a dataframe like

我想把它转换成一个像

a   b   c
1   2   3
4   5   6

Please help

请帮忙

Answer 1

回答by akilat90

Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Methodif that isn't the case)

您的列值似乎在实际 json 字符串之前有一个额外的数字。所以你可能想先把它去掉（如果不是这种情况，请跳到方法）

One way to do that is to apply a function to the column

一种方法是将函数应用于列

# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])

# print(df)
                         json
# 0  0 {"a":"1","b":"2","c":"3"}
# 1  1 {"a" :"4","b":"5","c":"6"}

# function to remove the number
import re

def split_num(val):
    p = re.compile("({.*)")
    return p.search(val).group(1)

# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)

#                          json
# 0   {"a":"1","b":"2","c":"3"}
# 1  {"a" :"4","b":"5","c":"6"}

Method:

方法：

Once the dfis in the above format, the below will convert each row entry to a dictionary:

一旦采用df上述格式，以下内容会将每一行条目转换为字典：

df['json'] = df['json'].map(lambda x: dict(eval(x)))

Then, applying pd.Seriesto the column will do the job

然后，申请pd.Series该专栏将完成这项工作

d = df['json'].apply(pd.Series)
print(d)
#   a  b  c
# 0  1  2  3
# 1  4  5  6

Answer 2

回答by kingdion

If you are using dataframes in pandas, you can use one of library functions known as from_dictwhich creates a dataframe from a dictionary.

如果您在Pandas中使用数据帧，您可以使用称为from_dict的库函数之一，它从字典创建数据帧。

If your data is json, you can convert that into a dict quite easily using the json library.

如果您的数据是 json，则可以使用 json 库轻松将其转换为 dict。

import json
import pandas 

my_dict = json.loads({"a" ;"4","b":"5","c":"6"})
pandas.DataFrame.from_dict(my_dict)

You can apply this logic to your rows.

您可以将此逻辑应用于您的行。

Answer 3

回答by nimrodz

with open(json_file) as f:
    df = pd.DataFrame(json.loads(line) for line in f)

pandas 将一列json字符串转换为一列数据

提问by PRATHAMESH

回答by akilat90

回答by kingdion

回答by nimrodz

相关推荐

最近更新

标签

pandas 将一列json字符串转换为一列数据

提问by PRATHAMESH

回答by akilat90

回答by kingdion

回答by nimrodz

相关推荐

pandas 类型错误：“系列”对象是可变的，因此它们不能被列散列问题

pandas 如何在 Python 中将日期转换为季度？

pandas 如何使用 IQR 从 DataFrame 中删除异常值？

pandas AttributeError：无法访问“DataFrameGroupBy”对象的可调用属性“reset_index”，请尝试使用“apply”方法

相关推荐

最近更新

标签