Python 熊猫数据框删除常量列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20209600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe remove constant column
提问by user1802143
I have a dataframe that may or may not have columns that are the same value. For example
我有一个数据框,它可能有也可能没有具有相同值的列。例如
row A B
1 9 0
2 7 0
3 5 0
4 2 0
I'd like to return just
我只想回来
row A
1 9
2 7
3 5
4 2
Is there a simple way to identify if any of these columns exist and then remove them?
是否有一种简单的方法可以识别这些列中是否存在任何列,然后将其删除?
采纳答案by chthonicdaemon
I believe this option will be faster than the other answers here as it will traverse the data frame only once for the comparison and short-circuit if a non-unique value is found.
我相信这个选项会比这里的其他答案更快,因为如果找到非唯一值,它只会遍历数据帧一次进行比较和短路。
>>> df
0 1 2
0 1 9 0
1 2 7 0
2 3 7 0
>>> df.loc[:, (df != df.iloc[0]).any()]
0 1
0 1 9
1 2 7
2 3 7
回答by DSM
Ignoring NaNs like usual, a column is constant if nunique() == 1. So:
NaN像往常一样忽略s,如果 ,则列是常量nunique() == 1。所以:
>>> df
A B row
0 9 0 1
1 7 0 2
2 5 0 3
3 2 0 4
>>> df = df.loc[:,df.apply(pd.Series.nunique) != 1]
>>> df
A row
0 9 1
1 7 2
2 5 3
3 2 4
回答by Hng
Assuming that the DataFrame is completely of type numeric:
假设 DataFrame 完全是数字类型:
you can try:
你可以试试:
>>> df = df.loc[:, df.var() == 0.0]
which will remove constant(i.e. variance = 0) columns.
这将删除常量(即方差 = 0)列。
If the DataFrame is of type both numeric and object, then you should try:
如果 DataFrame 是数字和对象类型,那么您应该尝试:
>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)
which will drop constant columns of numeric type only.
这将仅删除数字类型的常量列。
If you also want to ignore/delete constant enum columns, you should try:
如果您还想忽略/删除常量枚举列,您应该尝试:
>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> enum_df = enum_df.loc[:, [True if y !=1 else False for y in [len(np.unique(x, return_counts=True)[-1]) for x in enum_df.T.as_matrix()]]]
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)
回答by dreyco676
Here is my solution since I needed to do both object and numerical columns. Not claiming its super efficient or anything but it gets the job done.
这是我的解决方案,因为我需要同时处理对象列和数字列。没有声称它的超级高效或任何东西,但它完成了工作。
def drop_constants(df):
"""iterate through columns and remove columns with constant values (all same)"""
columns = df.columns.values
for col in columns:
# drop col if unique values is 1
if df[col].nunique(dropna=False) == 1:
del df[col]
return df
Extra caveat, it won't work on columns of lists or arrays since they are not hashable.
额外的警告,它不适用于列表或数组的列,因为它们不可散列。
回答by Yantraguru
I compared various methods on data frame of size 120*10000. And found the efficient one is
我比较了大小为 120*10000 的数据框的各种方法。并发现有效的一个是
def drop_constant_column(dataframe):
"""
Drops constant value columns of pandas dataframe.
"""
return dataframe.loc[:, (dataframe != dataframe.iloc[0]).any()]
1 loop, best of 3: 237 ms per loop
1 个循环,最好的 3 个:每个循环 237 毫秒
The other contenders are
其他竞争者是
def drop_constant_columns(dataframe):
"""
Drops constant value columns of pandas dataframe.
"""
result = dataframe.copy()
for column in dataframe.columns:
if len(dataframe[column].unique()) == 1:
result = result.drop(column,axis=1)
return result
1 loop, best of 3: 19.2 s per loop
1 个循环,最好的 3 个:每个循环 19.2 秒
def drop_constant_columns_2(dataframe):
"""
Drops constant value columns of pandas dataframe.
"""
for column in dataframe.columns:
if len(dataframe[column].unique()) == 1:
dataframe.drop(column,inplace=True,axis=1)
return dataframe
1 loop, best of 3: 317 ms per loop
1 个循环,最好的 3 个:每个循环 317 毫秒
def drop_constant_columns_3(dataframe):
"""
Drops constant value columns of pandas dataframe.
"""
keep_columns = [col for col in dataframe.columns if len(dataframe[col].unique()) > 1]
return dataframe[keep_columns].copy()
1 loop, best of 3: 358 ms per loop
1 个循环,最好的 3 个:每个循环 358 毫秒
def drop_constant_columns_4(dataframe):
"""
Drops constant value columns of pandas dataframe.
"""
keep_columns = dataframe.columns[dataframe.nunique()>1]
return dataframe.loc[:,keep_columns].copy()
1 loop, best of 3: 1.8 s per loop
1 个循环,最好的 3 个:每个循环 1.8 秒
回答by vasili111
Many examples in this thread does not work properly. Check this my answerwith collection of examples that work
此线程中的许多示例无法正常工作。用一组有效的例子来检查我的答案

