pandas 在熊猫数据框中用 NaN 替换空列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40818924/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:30:46  来源:igfitidea点击:

replace empty list with NaN in pandas dataframe

pythonpandas

提问by running man

I'm trying to replace some empty list in my data with a NaN values. But how to represent an empty list in the expression?

我试图用 NaN 值替换我的数据中的一些空列表。但是如何在表达式中表示一个空列表呢?

import numpy as np
import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d

    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4



d.loc[d['x'] == [],['x']] = d.loc[d['x'] == [],'x'].apply(lambda x: np.nan)
d

ValueError: Arrays were different lengths: 4 vs 0

And, I want to select [text]by using d[d['x'] == ["text"]]with a ValueError: Arrays were different lengths: 4 vs 1error, but select 3by using d[d['y'] == 3]is correct. Why?

而且,我想选择[text]使用d[d['x'] == ["text"]]带有ValueError: Arrays were different lengths: 4 vs 1错误,但选择3使用d[d['y'] == 3]是正确的。为什么?

回答by Abdou

If you wish to replace empty lists in the column xwith numpy nan's, you can do the following:

如果您想x用 numpy替换列中的空列表nan,您可以执行以下操作:

d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)

If you want to subset the dataframe on rows equal to ['text'], try the following:

如果要对等于 的行上的数据帧进行子集化['text'],请尝试以下操作:

d[[y==['text'] for y in d.x]]

I hope this helps.

我希望这有帮助。

回答by Alex

To answer your main question, just leave out the empty lists altogether. The NaN's will automatically get populated in if there's a value in one column and not the other if you use pandas.concat instead of building a dataframe from a dictionary.

要回答您的主要问题,只需完全省略空列表。如果您使用 pandas.concat 而不是从字典构建数据框,那么如果一列中有值而不是另一列中的值,则 NaN 将自动填充。

>>> import pandas as pd
>>> ser1 = pd.Series([[1,2,3], [1,2], ["text"]], name='x')
>>> ser2 = pd.Series([1,2,3,4], name='y')
>>> result = pd.concat([ser1, ser2], axis=1)
>>> result
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2     [text]  3
3        NaN  4

About your second question, it seems that you can't search inside of an element. Perhaps you should make that a separate question since it's not really related to your main question.

关于您的第二个问题,您似乎无法在元素内部进行搜索。也许你应该把它作为一个单独的问题,因为它与你的主要问题并不真正相关。

回答by Shawn Mark

You can use function "apply" to match the specified cell value no matter it is the instance of string, list and so on.

无论是字符串、列表等的实例,都可以使用函数“apply”匹配指定的单元格值。

For example, in your case:

例如,在您的情况下:

import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d
    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4

if you use d == 3to select the cell whose value is 3, it's totally ok:

如果你d == 3用来选择值为3的单元格,那完全没问题:

      x       y
0   False   False
1   False   False
2   False   True
3   False   False

However, if you use the equal sign to match a list, there may be out of your exception, like d == [text]or d == ['text']or d == '[text]', such as the following: enter image description here

但是,如果您使用等号来匹配列表,则可能会出现您的异常,例如d == [text]d == ['text']d == '[text]',例如: 在此处输入图像描述

There's some solutions:

有一些解决方案:

  1. Use function apply()on the specified Series in your Dataframe just like the answer on the top:
  1. apply()就像顶部的答案一样,在 Dataframe 中的指定系列上使用函数:

enter image description here

在此处输入图片说明

  1. A more general method with the function applymap()on a Dataframe may be used for the preprocessing step:

    d.applymap(lambda x: x == [])

      x       y
    

    0 False False 1 False False 2 False False 3 True False

  1. applymap()在数据帧上使用函数的更通用的方法可用于预处理步骤:

    d.applymap(lambda x: x == [])

      x       y
    

    0 假 假 1 假 假 2 假 假 3 真 假

Wish it can help you and the following learners and it would be better if you add a type check in you applymapfunction which would otherwise cause some exceptions probably.

希望它可以帮助您和以下学习者,如果您在applymap函数中添加类型检查会更好,否则可能会导致一些异常。