使用 Pandas 访问 json 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43668533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:29:46  来源:igfitidea点击:

Access a json column with pandas

pythonjsonpandas

提问by Reimus Klinsman

I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.

我有一个 csv 文件,其中一列是 json。我希望能够访问 json 列中的信息,但我无法弄清楚。

My csv file is like

我的 csv 文件就像

id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"

I'm reading in the like like

我正在阅读之类的

test = pd.read_csv(filename)
df = pd.DataFrame(test)

I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.

我希望能够获得在 json 列中具有“sub”的所有行,并最终能够获得这些键的值。

回答by andrew_reece

Here's one approach, which uses the read_csvconvertersargument to build jsonas JSON. Then use applyto select on the jsonfield keys in each row. CustomParsertaken from this answer.

这是一种方法,它使用read_csvconverters参数构建json为 JSON。然后使用apply选择json每行中的字段键。CustomParser取自这个答案

EDIT
Updated to look two levels deep, and takes variable targetparameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.

编辑
更新以查看两层深,并采用可变target参数(因此可以根据需要“添加”或“子”)。但是,此解决方案不会处理任意数量的级别。

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

df = pd.read_csv('test.csv', converters={'json':CustomParser})

def check_keys(json, target):
    if target in json:
        return True
    for key in json:
        if isinstance(json[key], dict):
            if target in json[key]:
                return True
    return False

print(df.loc[df.json.apply(check_keys, args=('sub',))])

   id letter                 json
1   2      b           {'sub': 5}
2   3      c  {'add': {'sub': 4}}

回答by Psidom

When you read the file in, the jsonfield will still be of strtype, you can use ast.literal_evalto convert the string to dictionary, and then use applymethod to check if any cell contain the key add:

当您读入文件时,该json字段仍为str类型,您可以使用ast.literal_eval将字符串转换为字典,然后使用apply方法检查是否有任何单元格包含该键add

from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}


In case you want to check nested keys:

如果您想检查嵌套键:

def check_add(d):
    if "add" in d:
        return True

    for k in d:
        if isinstance(d[k], dict):
            if check_add(d[k]):
                return True

    return False

df[df["json"].apply(check_add)]

#  id   letter  json
#0  1       a   {'add': 2}
#2  3       c   {'add': {'sub': 4}}

This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.

这不会检查字典以外的嵌套值;如果需要,它应该类似于根据您的数据实施。