使用 Pandas 访问 json 列

Question

提问by Reimus Klinsman

I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.

我有一个 csv 文件，其中一列是 json。我希望能够访问 json 列中的信息，但我无法弄清楚。

My csv file is like

我的 csv 文件就像

id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"

I'm reading in the like like

我正在阅读之类的

test = pd.read_csv(filename)
df = pd.DataFrame(test)

I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.

我希望能够获得在 json 列中具有“sub”的所有行，并最终能够获得这些键的值。

Answer 1

回答by andrew_reece

Here's one approach, which uses the read_csvconvertersargument to build jsonas JSON. Then use applyto select on the jsonfield keys in each row. CustomParsertaken from this answer.

这是一种方法，它使用read_csvconverters参数构建json为 JSON。然后使用apply选择json每行中的字段键。CustomParser取自这个答案。

EDIT
Updated to look two levels deep, and takes variable targetparameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.

编辑
更新以查看两层深，并采用可变target参数（因此可以根据需要“添加”或“子”）。但是，此解决方案不会处理任意数量的级别。

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

df = pd.read_csv('test.csv', converters={'json':CustomParser})

def check_keys(json, target):
    if target in json:
        return True
    for key in json:
        if isinstance(json[key], dict):
            if target in json[key]:
                return True
    return False

print(df.loc[df.json.apply(check_keys, args=('sub',))])

   id letter                 json
1   2      b           {'sub': 5}
2   3      c  {'add': {'sub': 4}}

Answer 2

回答by Psidom

When you read the file in, the jsonfield will still be of strtype, you can use ast.literal_evalto convert the string to dictionary, and then use applymethod to check if any cell contain the key add:

当您读入文件时，该json字段仍为str类型，您可以使用ast.literal_eval将字符串转换为字典，然后使用apply方法检查是否有任何单元格包含该键add：

from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

In case you want to check nested keys:

如果您想检查嵌套键：

def check_add(d):
    if "add" in d:
        return True

    for k in d:
        if isinstance(d[k], dict):
            if check_add(d[k]):
                return True

    return False

df[df["json"].apply(check_add)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.

这不会检查字典以外的嵌套值；如果需要，它应该类似于根据您的数据实施。

使用 Pandas 访问 json 列

提问by Reimus Klinsman

回答by andrew_reece

回答by Psidom

相关推荐

最近更新

标签

使用 Pandas 访问 json 列

提问by Reimus Klinsman

回答by andrew_reece

回答by Psidom

相关推荐

pandas 使用 json_normalize 压平嵌套的 json

pandas 从数据框中选择列存在的条件

将字符串/文本和 Pandas 数据框写入 excel

pandas ValueError：轴中不包含标签

相关推荐

最近更新

标签