Python 如何从 Pandas 数据框中删除行列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14661701/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:06:01  来源:igfitidea点击:

How to drop a list of rows from Pandas dataframe?

pythonpandas

提问by bigbug

I have a dataframe df :

我有一个数据框 df :

>>> df
                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20060630   6.590       NaN      6.590   5.291
       20060930  10.103       NaN     10.103   7.981
       20061231  15.915       NaN     15.915  12.686
       20070331   3.196       NaN      3.196   2.710
       20070630   7.907       NaN      7.907   6.459

Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4],then left:

然后我想删除具有列表中指示的某些序列号的行,假设这里[1,2,4],剩下:

                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20061231  15.915       NaN     15.915  12.686
       20070630   7.907       NaN      7.907   6.459

How or what function can do that ?

如何或什么功能可以做到这一点?

采纳答案by Theodros Zelleke

Use DataFrame.dropand pass it a Series of index labels:

使用DataFrame.drop并传递一系列索引标签:

In [65]: df
Out[65]: 
       one  two
one      1    4
two      2    3
three    3    2
four     4    1


In [66]: df.drop(df.index[[1,3]])
Out[66]: 
       one  two
one      1    4
three    3    2

回答by user3155053

Note that it may be important to use the "inplace" command when you want to do the drop in line.

请注意,当您想要执行插入操作时,使用“就地”命令可能很重要。

df.drop(df.index[[1,3]], inplace=True)

Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

因为您的原始问题没有返回任何内容,所以应该使用此命令。 http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

回答by danielhadar

You can also pass to DataFrame.dropthe label itself(instead of Series of index labels):

您还可以传递给DataFrame.drop标签本身(而不是索引标签系列):

In[17]: df
Out[17]: 
            a         b         c         d         e
one  0.456558 -2.536432  0.216279 -1.305855 -0.121635
two -1.015127 -0.445133  1.867681  2.179392  0.518801

In[18]: df.drop('one')
Out[18]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

Which is equivalent to:

这相当于:

In[19]: df.drop(df.index[[0]])
Out[19]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

回答by mepstein

In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop(), a la:

在对@theodros-zelleke 的回答的评论中,@j-jones 询问了如果索引不唯一该怎么办。我不得不处理这样的情况。我所做的是在调用之前重命名索引中的重复项drop(),a la:

dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)

where rename_duplicates()is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv()uses on columns, i.e., "%s.%d" % (name, count), where nameis the name of the row and countis how many times it has occurred previously.

rename_duplicates()我定义的函数在哪里,它遍历索引元素并重命名重复项。我使用了与pd.read_csv()列相同的重命名模式,即,"%s.%d" % (name, count)哪里name是行的名称以及count它之前出现的次数。

回答by Dennis Golomazov

If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[])takes too much time.

如果 DataFrame 很大,并且要删除的行数也很大,那么简单的按索引删除df.drop(df.index[])会花费太多时间。

In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols, and I need to remove 10krows from it. The fastest method I found is, quite counterintuitively, to takethe remaining rows.

就我而言,我有一个带有 的多索引浮点数数据帧100M rows x 3 cols,我需要从中删除10k行。我发现的最快的方法是,非常违反直觉,take剩下的行。

Let indexes_to_dropbe an array of positional indexes to drop ([1, 2, 4]in the question).

indexes_to_drop是要删除的位置索引数组([1, 2, 4]在问题中)。

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

In my case this took 20.5s, while the simple df.droptook 5min 27sand consumed a lot of memory. The resulting DataFrame is the same.

在我的情况下,这需要20.5s,而简单的df.drop需要5min 27s并消耗大量内存。生成的 DataFrame 是相同的。

回答by Divyansh

If I want to drop a row which has let's say index x, I would do the following:

如果我想删除一行,比如说 index x,我会执行以下操作:

df = df[df.index != x]

If I would want to drop multiple indices (say these indices are in the list unwanted_indices), I would do:

如果我想删除多个索引(比如这些索引在列表中unwanted_indices),我会这样做:

desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]

回答by Krishnaprasad Challuru

I solved this in a simpler way - just in 2 steps.

我以一种更简单的方式解决了这个问题——只需 2 个步骤。

Step 1: First form a dataframe with unwanted rows/data.

第 1 步:首先形成一个包含不需要的行/数据的数据框。

Step 2: Use the index of this unwanted dataframe to drop the rows from the original dataframe.

第 2 步:使用此不需要的数据帧的索引从原始数据帧中删除行。

Example:

例子:

Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.

假设您有一个数据框 df,其中包含许多列,其中包括一个整数“年龄”。现在假设您要删除所有以“年龄”为负数的行。

Step 1: df_age_negative = df[ df['Age'] < 0 ]

第 1 步:df_age_negative = df[ df['Age'] < 0 ]

Step 2: df = df.drop(df_age_negative.index, axis=0)

第二步:df = df.drop(df_age_negative.index,axis=0)

Hope this is much simpler and helps you.

希望这更简单并且可以帮助您。

回答by cyber-math

Here is a bit specific example, I would like to show. Say you have many duplicate entries in some of your rows. If you have string entries you could easily use string methods to find all indexes to drop.

这是一个有点具体的例子,我想展示一下。假设您的某些行中有许多重复的条目。如果您有字符串条目,您可以轻松使用字符串方法来查找要删除的所有索引。

ind_drop = df[df['column_of_strings'].apply(lambda x: x.startswith('Keyword'))].index

And now to drop those rows using their indexes

现在使用它们的索引删除这些行

new_df = df.drop(ind_drop)

回答by Adam Zeldin

Determining the index from the boolean as described above e.g.

如上所述从布尔值确定索引,例如

df[df['column'].isin(values)].index

can be more memory intensive than determining the index using this method

可能比使用此方法确定索引更占用内存

pd.Index(np.where(df['column'].isin(values))[0])

applied like so

像这样申请

df.drop(pd.Index(np.where(df['column'].isin(values))[0]), inplace = True)

This method is useful when dealing with large dataframes and limited memory.

这种方法在处理大数据帧和有限内存时很有用。

回答by kamran kausar

Use only the Index arg to drop row:-

仅使用索引 arg 删除行:-

df.drop(index = 2, inplace = True)

For multiple rows:-

对于多行:-

df.drop(index=[1,3], inplace = True)