Python Pandas - 在一个命令中从数据框中删除多个系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14363640/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas - Deleting multiple series from a data frame in one command
提问by Grant M.
In short ... I have a Python Pandas data frame that is read in from an Excel file using 'read_table'. I would like to keep a handful of the series from the data, and purge the rest. I know that I can just delete what I don't want one-by-one using 'del data['SeriesName']', but what I'd rather do is specify what to keep instead of specifying what to delete.
简而言之......我有一个 Python Pandas 数据框,它是使用“read_table”从 Excel 文件中读取的。我想从数据中保留一些系列,并清除其余的。我知道我可以使用 'del data['SeriesName']' 逐一删除我不想要的内容,但我宁愿做的是指定要保留的内容而不是指定要删除的内容。
If the simplest answer is to copy the existing data frame into a new data frame that only contains the series I want, and then delete the existing frame in its entirety, I would satisfied with that solution ... but if that is indeed the best way, can someone walk me through it?
如果最简单的答案是将现有数据框复制到仅包含我想要的系列的新数据框,然后完全删除现有框,我会对该解决方案感到满意......但如果这确实是最好的方式,有人可以引导我通过它吗?
TIA ... I'm a newb to Pandas. :)
TIA ......我是 Pandas 的新手。:)
回答by Zelazny7
You can use the DataFramedropfunction to remove columns. You have to pass the axis=1option for it to work on columns and not rows. Note that it returns a copy so you have to assign the result to a new DataFrame:
您可以使用该DataFramedrop函数删除列。您必须传递axis=1选项才能处理列而不是行。请注意,它返回一个副本,因此您必须将结果分配给一个新的DataFrame:
In [1]: from pandas import *
In [2]: df = DataFrame(dict(x=[0,0,1,0,1], y=[1,0,1,1,0], z=[0,0,1,0,1]))
In [3]: df
Out[3]:
x y z
0 0 1 0
1 0 0 0
2 1 1 1
3 0 1 0
4 1 0 1
In [4]: df = df.drop(['x','y'], axis=1)
In [5]: df
Out[5]:
z
0 0
1 0
2 1
3 0
4 1
回答by Theodros Zelleke
Basically the same as Zelazny7's answer -- just specifying what to keep:
与 Zelazny7 的回答基本相同——只是指定要保留的内容:
In [68]: df
Out[68]:
x y z
0 0 1 0
1 0 0 0
2 1 1 1
3 0 1 0
4 1 0 1
In [70]: df = df[['x','z']]
In [71]: df
Out[71]:
x z
0 0 0
1 0 0
2 1 1
3 0 0
4 1 1
*Edit*
*编辑*
You can specify a large number of columns through indexing/slicing into the Dataframe.columnsobject.
This object of type(pandas.Index)can be viewed as a dictof column labels (with some extended functionality).
您可以通过对Dataframe.columns对象进行索引/切片来指定大量列。
这个对象type(pandas.Index)可以被视为一个dict列标签(具有一些扩展功能)。
See this extension of above examples:
请参阅上述示例的扩展:
In [4]: df.columns
Out[4]: Index([x, y, z], dtype=object)
In [5]: df[df.columns[1:]]
Out[5]:
y z
0 1 0
1 0 0
2 1 1
3 1 0
4 0 1
In [7]: df.drop(df.columns[1:], axis=1)
Out[7]:
x
0 0
1 0
2 1
3 0
4 1
回答by oW_
You can also specify a list of columns to keep with the usecolsoption in pandas.read_table. This speeds up the loading process as well.
您还可以指定要与 中的usecols选项保持一致的列列表pandas.read_table。这也加快了加载过程。

