Python 在熊猫中删除列的最佳方法是什么

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51167612/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:46:35  来源:igfitidea点击:

What is the best way to remove columns in pandas

pythonpandasdataframe

提问by Mohamed Thasin ah

I am raising this question for my self learning. As far as I know, followings are the different methods to remove columns in pandas dataframe.

我提出这个问题是为了我的自学。据我所知,以下是删除 Pandas 数据框中列的不同方法。

Option - 1:

选项1:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
del df['a']

Option - 2:

选项 - 2:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df.drop('a',1)

Option - 3:

选项 - 3:

df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df[['b','c']]
  1. What is the best approach among these?
  2. Any other approaches to achieve the same?
  1. 其中最好的方法是什么?
  2. 还有其他方法可以实现相同的目标吗?

采纳答案by YaOzI

Follow the doc:

按照文档

DataFrame is a 2-dimensional labeled data structurewith columns of potentially different types.

DataFrame 是一种二维标记数据结构,具有可能不同类型的列。

And pandas.DataFrame.drop:

并且pandas.DataFrame.drop

Drop specified labelsfrom rows or columns.

从行或列中删除指定的标签

So, I think we should stick with df.drop. Why? I think the pros are:

所以,我认为我们应该坚持使用df.drop. 为什么?我认为优点是:

  1. It gives us more control of the remove action:

    # This will return a NEW DataFrame object, leave the original `df` untouched.
    df.drop('a', axis=1)  
    # This will modify the `df` inplace. **And return a `None`**.
    df.drop('a', axis=1, inplace=True)  
    
  2. It can handle more complicated cases with it's args. E.g. with level, we can handle MultiIndex deletion. And with errors, we can prevent some bugs.

  3. It's a more unified and object oriented way.

  1. 它使我们可以更好地控制删除操作:

    # This will return a NEW DataFrame object, leave the original `df` untouched.
    df.drop('a', axis=1)  
    # This will modify the `df` inplace. **And return a `None`**.
    df.drop('a', axis=1, inplace=True)  
    
  2. 它可以使用 args 处理更复杂的情况。例如level,我们可以处理多索引删除。使用errors,我们可以防止一些错误。

  3. 这是一种更加统一和面向对象的方式。



And just like @jezrael noted in his answer:

就像@jezrael 在他的回答中指出的那样:

Option 1: Using key word delis a limited way.

选项 1:使用关键字del是一种有限的方式。

Option 3: And df=df[['b','c']]isn't even a deletion in essence. It first select data by indexingwith []syntax, then unbind the name dfwith the original DataFrame and bind it with the new one (i.e. df[['b','c']]).

选项 3:df=df[['b','c']]本质上甚至不是删除。它首先通过使用[]语法进行索引选择数据,然后将名称df与原始 DataFrame解除绑定并将其与新的 DataFrame 绑定(即df[['b','c']])。

回答by razmik

The recommended way to delete a column or row in pandas dataframes is using drop.

在 Pandas 数据框中删除列或行的推荐方法是使用 drop。

To delete a column,

要删除列,

df.drop('column_name', axis=1, inplace=True)

To delete a row,

要删除一行,

df.drop('row_index', axis=0, inplace=True)

You can refer this postto see a detailed conversation about column delete approaches.

您可以参考这篇文章以查看有关列删除方法的详细对话。

回答by aydow

From a speed perspective, option 1 seems to be the best. Obviously, based on the other answers, that doesn't mean it's actually the best option.

从速度的角度来看,选项 1 似乎是最好的。显然,根据其他答案,这并不意味着它实际上是最佳选择。

In [52]: import timeit

In [53]: s1 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: del df['a']
    ...: """

In [54]: s2 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: df=df.drop('a',1)
    ...: """

In [55]: s3 = """
    ...: import pandas as pd
    ...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
    ...: df=df[['b','c']]
    ...: """

In [56]: timeit.timeit(stmt=s1, number=100000)
Out[56]: 53.37321400642395

In [57]: timeit.timeit(stmt=s2, number=100000)
Out[57]: 79.68139410018921

In [58]: timeit.timeit(stmt=s3, number=100000)
Out[58]: 76.25269913673401

回答by jezrael

In my opinion the best is use 2. and 3. option, because first has limits - you can remove only one column and cannot use dot notation- del df.a.

在我看来,最好是使用 2. 和 3. 选项,因为第一个有限制 - 您只能删除一列并且不能使用点表示法- del df.a

3.solution is not deleting, but selecting and piRSquaredcreate nice answer for multiple possible solutions with same idea.

3.solution 不是删除,而是选择和piRSquared为具有相同想法的多个可能解决方案创建了很好的答案。