使用 Pandas 访问 json 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43668533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access a json column with pandas
提问by Reimus Klinsman
I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.
我有一个 csv 文件,其中一列是 json。我希望能够访问 json 列中的信息,但我无法弄清楚。
My csv file is like
我的 csv 文件就像
id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"
I'm reading in the like like
我正在阅读之类的
test = pd.read_csv(filename)
df = pd.DataFrame(test)
I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.
我希望能够获得在 json 列中具有“sub”的所有行,并最终能够获得这些键的值。
回答by andrew_reece
Here's one approach, which uses the read_csvconvertersargument to build jsonas JSON. Then use applyto select on the jsonfield keys in each row. CustomParsertaken from this answer.
这是一种方法,它使用read_csvconverters参数构建json为 JSON。然后使用apply选择json每行中的字段键。CustomParser取自这个答案。
EDIT
Updated to look two levels deep, and takes variable targetparameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.
编辑
更新以查看两层深,并采用可变target参数(因此可以根据需要“添加”或“子”)。但是,此解决方案不会处理任意数量的级别。
def CustomParser(data):
import json
j1 = json.loads(data)
return j1
df = pd.read_csv('test.csv', converters={'json':CustomParser})
def check_keys(json, target):
if target in json:
return True
for key in json:
if isinstance(json[key], dict):
if target in json[key]:
return True
return False
print(df.loc[df.json.apply(check_keys, args=('sub',))])
id letter json
1 2 b {'sub': 5}
2 3 c {'add': {'sub': 4}}
回答by Psidom
When you read the file in, the jsonfield will still be of strtype, you can use ast.literal_evalto convert the string to dictionary, and then use applymethod to check if any cell contain the key add:
当您读入文件时,该json字段仍为str类型,您可以使用ast.literal_eval将字符串转换为字典,然后使用apply方法检查是否有任何单元格包含该键add:
from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]
# id letter json
#0 1 a {'add': 2}
#2 3 c {'add': {'sub': 4}}
In case you want to check nested keys:
如果您想检查嵌套键:
def check_add(d):
if "add" in d:
return True
for k in d:
if isinstance(d[k], dict):
if check_add(d[k]):
return True
return False
df[df["json"].apply(check_add)]
# id letter json
#0 1 a {'add': 2}
#2 3 c {'add': {'sub': 4}}
This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.
这不会检查字典以外的嵌套值;如果需要,它应该类似于根据您的数据实施。

