pandas 熊猫在 to_json 时删除空值

Question

提问by mva

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says:

我实际上有一个Pandas数据框，我想将它保存为 json 格式。从Pandas文档它说：

Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters

注意 NaN's、NaT's 和 None 将被转换为 null 并且 datetime 对象将根据 date_format 和 date_unit 参数进行转换

Then using the orient option recordsi have something like this

然后使用 orient 选项records我有这样的东西

[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]

Is it possible to have this instead:

是否可以用这个代替：

[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]'

Thank you

谢谢

Answer 1

采纳答案by EdChum

The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_jsonon this:

以下接近你想要的，本质上我们创建了一个非 NaN 值的列表，然后调用to_json它：

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaNvalues which is what you want to avoid:

这里需要创建一个列表，否则它会尝试将结果与您的原始 df 形状对齐，这将重新引入NaN您想要避免的值：

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling liston the result of dropnawill broadcast the result with the shape, like filling:

也调用list的结果dropna将广播结果与形状，如填充：

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'

Answer 2

回答by Dave DeCaprio

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

上面的解决方案实际上并没有以“记录”格式产生结果。此解决方案也使用 json 包，但产生的结果与原始问题中要求的结果完全相同。

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

此外，如果您想包含索引（并且您使用的是 Python 3.5+），您可以执行以下操作：

json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()])

Answer 3

回答by cssmlulu

I got the same problem and my solution is use jsonmodule instead of pd.DataFrame.to_json()

我遇到了同样的问题，我的解决方案是使用json模块而不是pd.DataFrame.to_json()

My solution is

我的解决方案是

drop the NaN value when converting DataFrame to dict, and then
convert dict to json using json.dumps()

将 DataFrame 转换为 dict 时删除 NaN 值，然后
使用 json.dumps() 将 dict 转换为 json

Here is the code:

这是代码：

import pandas as pd
import json
from pandas import compat

def to_dict_dropna(df):
   return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(df)}

json.dumps(to_dict_dropna(df))

pandas 熊猫在 to_json 时删除空值

提问by mva

采纳答案by EdChum

回答by Dave DeCaprio

回答by cssmlulu

相关推荐

最近更新

标签

pandas 熊猫在 to_json 时删除空值

提问by mva

采纳答案by EdChum

回答by Dave DeCaprio

回答by cssmlulu

相关推荐

Pandas msgpack 与泡菜

pandas 用随机值替换数据框中的 NaN

使用 sklearn 和 pandas 在一个模型中组合词袋和其他特征

pandas 从整数创建 tz 感知的熊猫时间戳对象

相关推荐

最近更新

标签