使用 Pandas 访问 json 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43668533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Access a json column with pandas
提问by Reimus Klinsman
I have a csv file where one column is json. I want to be able to access the information in the json column but I can't figure it out.
我有一个 csv 文件,其中一列是 json。我希望能够访问 json 列中的信息,但我无法弄清楚。
My csv file is like
我的 csv 文件就像
id, "letter", "json"
1,"a","{""add"": 2}"
2,"b","{""sub"": 5}"
3,"c","{""add"": {""sub"": 4}}"
I'm reading in the like like
我正在阅读之类的
test = pd.read_csv(filename)
df = pd.DataFrame(test)
I'd like to be able to get all the rows that have "sub" in the json column and ultimately be able to get the values for those keys.
我希望能够获得在 json 列中具有“sub”的所有行,并最终能够获得这些键的值。
回答by andrew_reece
Here's one approach, which uses the read_csv
converters
argument to build json
as JSON. Then use apply
to select on the json
field keys in each row. CustomParser
taken from this answer.
这是一种方法,它使用read_csv
converters
参数构建json
为 JSON。然后使用apply
选择json
每行中的字段键。CustomParser
取自这个答案。
EDIT
Updated to look two levels deep, and takes variable target
parameter (so it can be "add" or "sub", as needed). This solution won't handle an arbitrary number of levels, though.
编辑
更新以查看两层深,并采用可变target
参数(因此可以根据需要“添加”或“子”)。但是,此解决方案不会处理任意数量的级别。
def CustomParser(data):
import json
j1 = json.loads(data)
return j1
df = pd.read_csv('test.csv', converters={'json':CustomParser})
def check_keys(json, target):
if target in json:
return True
for key in json:
if isinstance(json[key], dict):
if target in json[key]:
return True
return False
print(df.loc[df.json.apply(check_keys, args=('sub',))])
id letter json
1 2 b {'sub': 5}
2 3 c {'add': {'sub': 4}}
回答by Psidom
When you read the file in, the json
field will still be of strtype, you can use ast.literal_eval
to convert the string to dictionary, and then use apply
method to check if any cell contain the key add
:
当您读入文件时,该json
字段仍为str类型,您可以使用ast.literal_eval
将字符串转换为字典,然后使用apply
方法检查是否有任何单元格包含该键add
:
from ast import literal_eval
df["json"] = df["json"].apply(literal_eval)
df[df["json"].apply(lambda d: "add" in d)]
# id letter json
#0 1 a {'add': 2}
#2 3 c {'add': {'sub': 4}}
In case you want to check nested keys:
如果您想检查嵌套键:
def check_add(d):
if "add" in d:
return True
for k in d:
if isinstance(d[k], dict):
if check_add(d[k]):
return True
return False
df[df["json"].apply(check_add)]
# id letter json
#0 1 a {'add': 2}
#2 3 c {'add': {'sub': 4}}
This doesn't check nested values other than dictionary; If you need to, it should be similar to implement based on your data.
这不会检查字典以外的嵌套值;如果需要,它应该类似于根据您的数据实施。