Python 熊猫:删除所有 NaN 的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45147100/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:44:40  来源:igfitidea点击:

Pandas: drop columns with all NaN's

pythonpandasdataframein-place

提问by theprowler

I realize that dropping NaNs from a dataframe is as easy as df.dropnabut for some reason that isn't working on mine and I'm not sure why.

我意识到NaN从数据帧中删除s 很容易,df.dropna但由于某种原因,这对我的不起作用,我不确定为什么。

Here is my original dataframe:

这是我的原始数据框:

fish_frame1:                       0   1   2         3   4       5   6          7
0               #0915-8 NaN NaN       NaN NaN     NaN NaN        NaN
1                   NaN NaN NaN  LIVE WGT NaN  AMOUNT NaN      TOTAL
2               GBW COD NaN NaN     2,280 NaN   
fish_frame.dropna()
fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)
.60 NaN ,368.00 3 POLLOCK NaN NaN 1,611 NaN
fish_frame1 after dropna:                       0   1   2         3   4       5   6          7
0               #0915-8 NaN NaN       NaN NaN     NaN NaN        NaN
1                   NaN NaN NaN  LIVE WGT NaN  AMOUNT NaN      TOTAL
2               GBW COD NaN NaN     2,280 NaN   
    # drop the columns where all elements are NaN:

    >>> df.dropna(axis=1, how='all')
         A    B  D
    0  NaN  2.0  0
    1  3.0  4.0  1
    2  NaN  NaN  5
.60 NaN ,368.00 3 POLLOCK NaN NaN 1,611 NaN
fish_frame = fish_frame.dropna(axis = 1, how = 'all')
.01 NaN .11 4 WHAKE NaN NaN 441 NaN
fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)
.70 NaN 8.70 5 GBE HADDOCK NaN NaN 2,788 NaN
fish_frame = fish_frame.dropna()
.01 NaN .88 6 GBW HADDOCK NaN NaN 16,667 NaN
fish_frame = fish_frame.dropna(axis=0, how="any")
.01 NaN 6.67 7 REDFISH NaN NaN 932 NaN
fish_frame = fish_frame.dropna(axis=0, thresh=3, how="any") 
.01 NaN .32 8 GB WINTER FLOUNDER NaN NaN 145 NaN ##代码##.25 NaN .25 9 GOM WINTER FLOUNDER NaN NaN 25,070 NaN ##代码##.35 NaN ,774.50 10 GB YELLOWTAIL NaN NaN 26 NaN .75 NaN .50
.01 NaN .11 4 WHAKE NaN NaN 441 NaN ##代码##.70 NaN 8.70 5 GBE HADDOCK NaN NaN 2,788 NaN ##代码##.01 NaN .88 6 GBW HADDOCK NaN NaN 16,667 NaN ##代码##.01 NaN 6.67 7 REDFISH NaN NaN 932 NaN ##代码##.01 NaN .32 8 GB WINTER FLOUNDER NaN NaN 145 NaN ##代码##.25 NaN .25 9 GOM WINTER FLOUNDER NaN NaN 25,070 NaN ##代码##.35 NaN ,774.50 10 GB YELLOWTAIL NaN NaN 26 NaN .75 NaN .50

The code that follows is an attempt to drop all NaNs as well as any columns with more than 3 NaNs (either one, or both, should work I think):

下面的代码试图删除所有NaNs 以及任何超过 3 个NaNs 的列(我认为其中一个或两个都可以):

##代码##

This produces:

这产生:

##代码##

I am a novice with Pandasso I'm not sure if this isn't working because I'm doing something wrong or I'm misunderstanding something or misusing a command. Any help is appreciated thanks.

我是新手,Pandas所以我不确定这是否不起作用,因为我做错了什么,或者我误解了某些东西或滥用命令。感谢任何帮助。

采纳答案by Corley Brigman

From the dropnadocstring:

dropna文档字符串:

##代码##

回答by Rakesh Adhikesavan

dropna()drops the null values and returns a dataFrame. Assign it back to the original dataFrame.

dropna()删除空值并返回一个数据帧。将其分配回原始数据帧。

##代码##

Referring to your code:

参考你的代码:

##代码##

This would drop columns with 7 or more NaN's (assuming len(df) = 10), if you want to drop columns with more than 3 Nan's like you've mentioned, thresh should be equal to 3.

这将删除具有 7 个或更多 NaN 的列(假设 len(df) = 10),如果您想像您提到的那样删除具有 3 个以上 Nan 的列,则 thresh 应等于 3。

回答by SeeDerekEngineer

dropna()by default returns a dataframe (defaults to inplace=Falsebehavior) and thus needs to be assigned to a new dataframe for it to stay in your code.

dropna()默认情况下返回一个数据帧(默认为inplace=False行为),因此需要分配给一个新的数据帧以使其保留在您的代码中。

So for example,

例如,

##代码##

As to why your dropnais returning an empty dataframe, I'd recommend you look at the "how" argument in the dropna method (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html). Also bear in mind, axis=0 corresponds to columns, and axis=1 corresponds to rows.

至于为什么dropna返回空数据帧,我建议您查看 dropna 方法中的“如何”参数(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna .html)。还要记住,axis=0 对应列,axis=1 对应行。

So to remove columns with all "NAs", axis=0, how="any" should do the trick:

因此,要删除包含所有“NA”的列,axis=0,how="any" 应该可以解决问题:

##代码##

Finally, the "thresh" argument designates explicitly how many NA's are necessary for a drop to occur. So

最后,“thresh”参数明确指定发生下降需要多少 NA。所以

##代码##

should work fine and dandy to remove any column with three NA's.

应该可以很好地删除任何带有三个 NA 的列。

Also, as Corley pointed out, how="any" is the default and is thus not necessary.

此外,正如 Corley 指出的那样,how="any" 是默认值,因此没有必要。