pandas 如何在熊猫中用空列表[]填充数据框Nan值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33199193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:04:08  来源:igfitidea点击:

How to fill dataframe Nan values with empty list [] in pandas?

pythonpandasnan

提问by ALH

This is my dataframe:

这是我的数据框:

          date                          ids
0     2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1     2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2     2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3     2011-04-26  Nan
4     2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5     2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...

I want to replace Nanwith []. How to do that? Fillna([]) did not work. I even tried replace(np.nan, [])but it gives error:

我想Nan用[]替换。怎么做?Fillna([]) 不起作用。我什至尝试过,replace(np.nan, [])但它给出了错误:

 TypeError('Invalid "to_replace" type: \'float\'',)

采纳答案by Alexander

You can first use locto locate all rows that have a nanin the idscolumn, and then loop through these rows using atto set their values to an empty list:

你可以先使用loc以找出有所有行nanids列,然后通过使用这些行循环at到它们的值设置为空列表:

for row in df.loc[df.ids.isnull(), 'ids'].index:
    df.at[row, 'ids'] = []

>>> df
        date                                             ids
0 2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26                                              []
4 2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

回答by Nick Edgar

My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan:

我的方法类似于@hellpanderrr 的方法,但是测试列表性而不是使用isnan

df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])

I originally tried using pd.isnull(or pd.notnull) but, when given a list, that returns the null-ness of each element.

我最初尝试使用pd.isnull(or pd.notnull) 但是,当给定一个列表时,它返回每个元素的空性。

回答by PlasmaBinturong

After a lot of head-scratching I found this method that should be the most efficient (no looping, no apply), just assigning to a slice:

经过大量的挠头后,我发现这种方法应该是最有效的(没有循环,没有应用),只需分配给一个切片:

isnull = df.ids.isnull()

df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]

The trick was to construct your list of []of the right size (isnull.sum()), and thenenclose it in a list: the value you are assigning is a 2Darray (1 column, isnull.sum()rows) containing empty lists as elements.

诀窍是构建[]正确大小 ( isnull.sum())的列表,然后将其包含在一个列表中:您分配的值是一个包含空列表作为元素的二维数组(1 列,isnull.sum()行)。

回答by hellpanderr

Without assignments:

没有任务:

1) Assuming we have only floats and integers in our dataframe

1)假设我们的数据框中只有浮点数和整数

import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))

2) For any dataframe

2)对于任何数据帧

import math
def isnan(x):
    if isinstance(x, (int, long, float, complex)) and math.isnan(x):
        return True

df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))

回答by Allen

Another solution using numpy:

使用 numpy 的另一种解决方案:

df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)

Or using combine_first:

或者使用 combine_first:

df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))

回答by botivegh

This is probably faster, one liner solution:

这可能更快,一种班轮解决方案:

df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)

回答by keramat

Maybe more dense:

也许更密集:

df['ids'] = [[] if type(x) != list else x for x in df['ids']]

回答by TICH

Create a function that checks your condition, if not, it returns an empty list/empty set etc.

创建一个函数来检查你的条件,如果没有,它返回一个空列表/空集等。

Then apply that function to the variable, but also assigning the new calculated variable to the old one or to a new variable if you wish.

然后将该函数应用于变量,但也可以根据需要将新计算的变量分配给旧变量或新变量。

aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})


def check_condition(x):
    if x>0:
        return x
    else:
        return list()

aa['d]=aa.d.apply(lambda x:check_condition(x))