Python 熊猫:删除所有 NaN 的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45147100/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: drop columns with all NaN's
提问by theprowler
I realize that dropping NaN
s from a dataframe is as easy as df.dropna
but for some reason that isn't working on mine and I'm not sure why.
我意识到NaN
从数据帧中删除s 很容易,df.dropna
但由于某种原因,这对我的不起作用,我不确定为什么。
Here is my original dataframe:
这是我的原始数据框:
fish_frame1: 0 1 2 3 4 5 6 7
0 #0915-8 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN LIVE WGT NaN AMOUNT NaN TOTAL
2 GBW COD NaN NaN 2,280 NaN fish_frame.dropna()
fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)
.60 NaN ,368.00
3 POLLOCK NaN NaN 1,611 NaN fish_frame1 after dropna: 0 1 2 3 4 5 6 7
0 #0915-8 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN LIVE WGT NaN AMOUNT NaN TOTAL
2 GBW COD NaN NaN 2,280 NaN # drop the columns where all elements are NaN:
>>> df.dropna(axis=1, how='all')
A B D
0 NaN 2.0 0
1 3.0 4.0 1
2 NaN NaN 5
.60 NaN ,368.00
3 POLLOCK NaN NaN 1,611 NaN fish_frame = fish_frame.dropna(axis = 1, how = 'all')
.01 NaN .11
4 WHAKE NaN NaN 441 NaN fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)
.70 NaN 8.70
5 GBE HADDOCK NaN NaN 2,788 NaN fish_frame = fish_frame.dropna()
.01 NaN .88
6 GBW HADDOCK NaN NaN 16,667 NaN fish_frame = fish_frame.dropna(axis=0, how="any")
.01 NaN 6.67
7 REDFISH NaN NaN 932 NaN fish_frame = fish_frame.dropna(axis=0, thresh=3, how="any")
.01 NaN .32
8 GB WINTER FLOUNDER NaN NaN 145 NaN ##代码##.25 NaN .25
9 GOM WINTER FLOUNDER NaN NaN 25,070 NaN ##代码##.35 NaN ,774.50
10 GB YELLOWTAIL NaN NaN 26 NaN .75 NaN .50
.01 NaN .11
4 WHAKE NaN NaN 441 NaN ##代码##.70 NaN 8.70
5 GBE HADDOCK NaN NaN 2,788 NaN ##代码##.01 NaN .88
6 GBW HADDOCK NaN NaN 16,667 NaN ##代码##.01 NaN 6.67
7 REDFISH NaN NaN 932 NaN ##代码##.01 NaN .32
8 GB WINTER FLOUNDER NaN NaN 145 NaN ##代码##.25 NaN .25
9 GOM WINTER FLOUNDER NaN NaN 25,070 NaN ##代码##.35 NaN ,774.50
10 GB YELLOWTAIL NaN NaN 26 NaN .75 NaN .50
The code that follows is an attempt to drop all NaN
s as well as any columns with more than 3 NaN
s (either one, or both, should work I think):
下面的代码试图删除所有NaN
s 以及任何超过 3 个NaN
s 的列(我认为其中一个或两个都可以):
This produces:
这产生:
##代码##I am a novice with Pandas
so I'm not sure if this isn't working because I'm doing something wrong or I'm misunderstanding something or misusing a command. Any help is appreciated thanks.
我是新手,Pandas
所以我不确定这是否不起作用,因为我做错了什么,或者我误解了某些东西或滥用命令。感谢任何帮助。
采纳答案by Corley Brigman
From the dropna
docstring:
从dropna
文档字符串:
回答by Rakesh Adhikesavan
dropna()
drops the null values and returns a dataFrame. Assign it back to the original dataFrame.
dropna()
删除空值并返回一个数据帧。将其分配回原始数据帧。
Referring to your code:
参考你的代码:
##代码##This would drop columns with 7 or more NaN's (assuming len(df) = 10), if you want to drop columns with more than 3 Nan's like you've mentioned, thresh should be equal to 3.
这将删除具有 7 个或更多 NaN 的列(假设 len(df) = 10),如果您想像您提到的那样删除具有 3 个以上 Nan 的列,则 thresh 应等于 3。
回答by SeeDerekEngineer
dropna()
by default returns a dataframe (defaults to inplace=False
behavior) and thus needs to be assigned to a new dataframe for it to stay in your code.
dropna()
默认情况下返回一个数据帧(默认为inplace=False
行为),因此需要分配给一个新的数据帧以使其保留在您的代码中。
So for example,
例如,
##代码##As to why your dropna
is returning an empty dataframe, I'd recommend you look at the "how" argument in the dropna method (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html). Also bear in mind, axis=0 corresponds to columns, and axis=1 corresponds to rows.
至于为什么dropna
返回空数据帧,我建议您查看 dropna 方法中的“如何”参数(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna .html)。还要记住,axis=0 对应列,axis=1 对应行。
So to remove columns with all "NAs", axis=0, how="any" should do the trick:
因此,要删除包含所有“NA”的列,axis=0,how="any" 应该可以解决问题:
##代码##Finally, the "thresh" argument designates explicitly how many NA's are necessary for a drop to occur. So
最后,“thresh”参数明确指定发生下降需要多少 NA。所以
##代码##should work fine and dandy to remove any column with three NA's.
应该可以很好地删除任何带有三个 NA 的列。
Also, as Corley pointed out, how="any" is the default and is thus not necessary.
此外,正如 Corley 指出的那样,how="any" 是默认值,因此没有必要。