Python 根据Pandas中的列名删除多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28538536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Deleting multiple columns based on column names in Pandas
提问by Peadar Coyle
I have some data and when I import it I get the following unneeded columns I'm looking for an easy way to delete all of these
我有一些数据,当我导入它时,我得到以下不需要的列我正在寻找一种简单的方法来删除所有这些
'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31',
'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35',
'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43',
'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51',
'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55',
'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60'
They are indexed by 0-indexing so I tried something like
它们由 0-indexing 索引,所以我尝试了类似的方法
df.drop(df.columns[[22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32 ,55]], axis=1, inplace=True)
But this isn't very efficient. I tried writing some for loops but this struck me as bad Pandas behaviour. Hence i ask the question here.
但这不是很有效。我尝试编写一些 for 循环,但这让我觉得 Pandas 的行为很糟糕。因此我在这里问这个问题。
I've seen some examples which are similar (Drop multiple columns pandas) but this doesn't answer my question.
我看过一些类似的例子(Drop multiple columns pandas),但这并没有回答我的问题。
采纳答案by EdChum
I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:
我不知道你所说的低效是什么意思,但如果你的意思是在打字方面,选择感兴趣的列并分配回 df 会更容易:
df = df[cols_of_interest]
Where cols_of_interest
is a list of the columns you care about.
cols_of_interest
您关心的列的列表在哪里。
Or you can slice the columns and pass this to drop
:
或者您可以切片列并将其传递给drop
:
df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)
The call to head
just selects 0 rows as we're only interested in the column names rather than data
调用head
只选择 0 行,因为我们只对列名而不是数据感兴趣
update
更新
Another method would be simpler would be to use the boolean mask from str.contains
and invert it to mask the columns:
另一种更简单的方法是使用布尔掩码 fromstr.contains
并将其反转来掩码列:
In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df
Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []
In [4]:
~df.columns.str.contains('Unnamed:')
Out[4]:
array([ True, False, False, True], dtype=bool)
In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]
Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []
回答by knightofni
This is probably a good way to do what you want. It will delete all columns that contain 'Unnamed' in their header.
这可能是做你想做的事的好方法。它将删除标题中包含“未命名”的所有列。
for col in df.columns:
if 'Unnamed' in col:
del df[col]
回答by Shivgan
The below worked for me:
以下对我有用:
for col in df:
if 'Unnamed' in col:
#del df[col]
print col
try:
df.drop(col, axis=1, inplace=True)
except Exception:
pass
回答by Philipp Schwarz
The by far the simplest approach is:
迄今为止最简单的方法是:
yourdf.drop(['columnheading1', 'columnheading2'], axis=1, inplace=True)
回答by Peter
You can do this in one line and one go:
您可以一口气完成此操作:
df.drop([col for col in df.columns if "Unnamed" in col], axis=1, inplace=True)
This involves less moving around/copying of the object than the solutions above.
与上述解决方案相比,这涉及更少的对象移动/复制。
回答by sheldonzy
My personal favorite, and easier than the answers I have seen here (for multiple columns):
我个人最喜欢的,比我在这里看到的答案更容易(多列):
df.drop(df.columns[22:56], axis=1, inplace=True)
Or creating a list for multiple columns.
或者为多列创建一个列表。
col = list(df.columns)[22:56]
df.drop(col, axis=1, inplace=1)
回答by px06
Not sure if this solution has been mentioned anywhere yet but one way to do is is pandas.Index.difference
.
不确定这个解决方案是否已经在任何地方提到过,但一种方法是pandas.Index.difference
.
>>> df = pd.DataFrame(columns=['A','B','C','D'])
>>> df
Empty DataFrame
Columns: [A, B, C, D]
Index: []
>>> to_remove = ['A','C']
>>> df = df[df.columns.difference(to_remove)]
>>> df
Empty DataFrame
Columns: [B, D]
Index: []
回答by Sarah
df = df[[col for col in df.columns if not ('Unnamed' in col)]]
df = df[[col for col in df.columns if not ('Unnamed' in col)]]
回答by Maddu Swaroop
You can just pass the column names as a list with specifying the axis as 0 or 1
您可以将列名作为列表传递,并将轴指定为 0 或 1
- axis=1: Along the Rows
- axis=0: Along the Columns
By default axis=0
data.drop(["Colname1","Colname2","Colname3","Colname4"],axis=1)
- 轴 = 1:沿行
- 轴=0:沿列
默认轴=0
data.drop(["Colname1","Colname2","Colname3","Colname4"],axis=1)
回答by Niedson
Simple and Easy.Remove all columns after the 22th.
简单易行。删除 22 日之后的所有列。
df.drop(columns=df.columns[22:]) # love it