pandas 当 json_normalize 无法遍历列以展平时如何修复它?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51153854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fix json_normalize when it cannot iterate over column to flatten?
提问by RustyShackleford
I have a dataframe that looks like this:
我有一个看起来像这样的数据框:
ID phone_numbers
1 [{u'updated_at': u'2017-12-02 15:29:54', u'created_at': u'2017-12-0
2 15:29:54', u'sms': 0, u'number': u'1112223333', u'consumer_id':
12345, u'organization_id': 1, u'active': 1, u'deleted_at':
None, u'type': u'default', u'id': 1234}]
I want to take the phone_numbers column and flatten the information inside of it so I can query say the 'id' field.
我想获取 phone_numbers 列并将其中的信息展平,以便我可以查询“id”字段。
When I try;
当我尝试时;
json_normalize(df.phone_numbers)
I get error:
我得到错误:
AttributeError: 'str' object has no attribute 'itervalues'
AttributeError: 'str' 对象没有属性 'itervalues'
I am not sure why this error is being produced and why I can not flatten this column.
我不确定为什么会产生这个错误以及为什么我不能展平这个列。
EDIT:
编辑:
originally JSON string being read from a response object(r.text):
最初是从响应对象(r.text)中读取的 JSON 字符串:
https://docs.google.com/document/d/1Iq4PMcGXWx6O48sWqqYnZjG6UMSZoXfmN1WadQLkWYM/edit?usp=sharing
https://docs.google.com/document/d/1Iq4PMcGXWx6O48sWqqYnZjG6UMSZoXfmN1WadQLkWYM/edit?usp=sharing
EDIT:
编辑:
Converted a column I need to flatten into JSON through this command
通过此命令将我需要展平的列转换为 JSON
a = df.phone_numbers.to_json()
{"0":[{"updated_at":"2018-04-12 12:24:04","created_at":"2018-04-12 12:24:04","sms":0,"number":"","consumer_id":123,"org_id":123,"active":1,"deleted_at":null,"type":"default","id":123}]}
回答by jezrael
Use list comprehension with flatenning and adding new element ID
to dictionary:
使用列表理解与展平ID
并向字典添加新元素:
df = pd.DataFrame({'ID': [1, 2], 'phone_numbers': [[{'a': '2017', 'b': '2017', 'sms': 1},
{'a': '2018', 'b': '2017', 'sms': 2}],
[{'a': '2017', 'b': '2017', 'sms': 3}]]})
print (df)
ID phone_numbers
0 1 [{'a': '2017', 'b': '2017', 'sms': 1}, {'a': '...
1 2 [{'a': '2017', 'b': '2017', 'sms': 3}]
df = pd.DataFrame([dict(y, ID=i) for i, x in df.values.tolist() for y in x])
print (df)
ID a b sms
0 1 2017 2017 1
1 1 2018 2017 2
2 2 2017 2017 3
EDIT:
编辑:
df = pd.DataFrame({'phone_numbers':{"0":[{"type":"default","id":123}]}})
df = pd.DataFrame([y for x in df['phone_numbers'].values.tolist() for y in x])
print (df)
id type
0 123 default
回答by alvaro nortes
I am not sure but I think that json normalize expect as first argument a json, not a pd.series
, convert the series to a dict or list of dict first. You could use to_dict()
我不确定,但我认为 json normalize expect 作为第一个参数 json 而不是 a pd.series
,首先将系列转换为 dict 或 dict 列表。你可以用to_dict()
json_normalize(df.phone_numbers.to_dict())