pandas 熊猫在 to_json 时删除空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30912746/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:29:13  来源:igfitidea点击:

Pandas remove null values when to_json

pythonjsonpandas

提问by mva

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says:

我实际上有一个Pandas数据框,我想将它保存为 json 格式。从Pandas文档它说:

Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters

注意 NaN's、NaT's 和 None 将被转换为 null 并且 datetime 对象将根据 date_format 和 date_unit 参数进行转换

Then using the orient option recordsi have something like this

然后使用 orient 选项records我有这样的东西

[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]

Is it possible to have this instead:

是否可以用这个代替:

[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]'

Thank you

谢谢

采纳答案by EdChum

The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_jsonon this:

以下接近你想要的,本质上我们创建了一个非 NaN 值的列表,然后调用to_json它:

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaNvalues which is what you want to avoid:

这里需要创建一个列表,否则它会尝试将结果与您的原始 df 形状对齐,这将重新引入NaN您想要避免的值:

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling liston the result of dropnawill broadcast the result with the shape, like filling:

也调用list的结果dropna将广播结果与形状,如填充:

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'

回答by Dave DeCaprio

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

上面的解决方案实际上并没有以“记录”格式产生结果。此解决方案也使用 json 包,但产生的结果与原始问题中要求的结果完全相同。

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

此外,如果您想包含索引(并且您使用的是 Python 3.5+),您可以执行以下操作:

json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()])

回答by cssmlulu

I got the same problem and my solution is use jsonmodule instead of pd.DataFrame.to_json()

我遇到了同样的问题,我的解决方案是使用json模块而不是pd.DataFrame.to_json()

My solution is

我的解决方案是

  1. drop the NaN value when converting DataFrame to dict, and then
  2. convert dict to json using json.dumps()
  1. 将 DataFrame 转换为 dict 时删除 NaN 值,然后
  2. 使用 json.dumps() 将 dict 转换为 json

Here is the code:

这是代码:

import pandas as pd
import json
from pandas import compat

def to_dict_dropna(df):
   return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(df)}

json.dumps(to_dict_dropna(df))