pandas 当 json_normalize 无法遍历列以展平时如何修复它？

Question

提问by RustyShackleford

I have a dataframe that looks like this:

我有一个看起来像这样的数据框：

ID       phone_numbers
1        [{u'updated_at': u'2017-12-02 15:29:54', u'created_at': u'2017-12-0 
          2 15:29:54', u'sms': 0, u'number': u'1112223333', u'consumer_id': 
          12345, u'organization_id': 1, u'active': 1, u'deleted_at': 
           None, u'type': u'default', u'id': 1234}]

I want to take the phone_numbers column and flatten the information inside of it so I can query say the 'id' field.

我想获取 phone_numbers 列并将其中的信息展平，以便我可以查询“id”字段。

When I try;

当我尝试时；

json_normalize(df.phone_numbers)

I get error:

我得到错误：

AttributeError: 'str' object has no attribute 'itervalues'

AttributeError: 'str' 对象没有属性 'itervalues'

I am not sure why this error is being produced and why I can not flatten this column.

我不确定为什么会产生这个错误以及为什么我不能展平这个列。

EDIT:

编辑：

originally JSON string being read from a response object(r.text):

最初是从响应对象（r.text）中读取的 JSON 字符串：

https://docs.google.com/document/d/1Iq4PMcGXWx6O48sWqqYnZjG6UMSZoXfmN1WadQLkWYM/edit?usp=sharing

EDIT:

编辑：

Converted a column I need to flatten into JSON through this command

通过此命令将我需要展平的列转换为 JSON

a = df.phone_numbers.to_json()

{"0":[{"updated_at":"2018-04-12 12:24:04","created_at":"2018-04-12 12:24:04","sms":0,"number":"","consumer_id":123,"org_id":123,"active":1,"deleted_at":null,"type":"default","id":123}]}

Answer 1

回答by jezrael

Use list comprehension with flatenning and adding new element IDto dictionary:

使用列表理解与展平ID并向字典添加新元素：

df = pd.DataFrame({'ID': [1, 2], 'phone_numbers': [[{'a': '2017', 'b': '2017', 'sms': 1}, 
                                                    {'a': '2018', 'b': '2017', 'sms': 2}], 
                                                  [{'a': '2017', 'b': '2017', 'sms': 3}]]})
print (df)
   ID                                      phone_numbers
0   1  [{'a': '2017', 'b': '2017', 'sms': 1}, {'a': '...
1   2             [{'a': '2017', 'b': '2017', 'sms': 3}]

df = pd.DataFrame([dict(y, ID=i) for i, x in df.values.tolist() for y in x])
print (df)  

   ID     a     b  sms
0   1  2017  2017    1
1   1  2018  2017    2
2   2  2017  2017    3

EDIT:

编辑：

df = pd.DataFrame({'phone_numbers':{"0":[{"type":"default","id":123}]}})

df = pd.DataFrame([y for x in df['phone_numbers'].values.tolist() for y in x])
print (df) 
    id     type
0  123  default

Answer 2

回答by alvaro nortes

I am not sure but I think that json normalize expect as first argument a json, not a pd.series, convert the series to a dict or list of dict first. You could use to_dict()

我不确定，但我认为 json normalize expect 作为第一个参数 json 而不是 a pd.series，首先将系列转换为 dict 或 dict 列表。你可以用to_dict()

json_normalize(df.phone_numbers.to_dict())

pandas 当 json_normalize 无法遍历列以展平时如何修复它？

提问by RustyShackleford

回答by jezrael

回答by alvaro nortes

相关推荐

最近更新

标签

pandas 当 json_normalize 无法遍历列以展平时如何修复它？

提问by RustyShackleford

回答by jezrael

回答by alvaro nortes

相关推荐

Pandas datetools 模块错误

pandas 如何使用 Python 计算 Excel 文件中的总页数

pandas 每个列数据框的分布概率，在一个图中

pandas 将 numpy 数组数组转换为二维数组

相关推荐

最近更新

标签