Python 根据列值删除 Pandas 中的 DataFrame 行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18172851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:03:29  来源:igfitidea点击:

Deleting DataFrame row in Pandas based on column value

pythonpandas

提问by TravisVOX

I have the following DataFrame:

我有以下数据帧:

             daysago  line_race rating        rw    wrating
 line_date                                                 
 2007-03-31       62         11     56  1.000000  56.000000
 2007-03-10       83         11     67  1.000000  67.000000
 2007-02-10      111          9     66  1.000000  66.000000
 2007-01-13      139         10     83  0.880678  73.096278
 2006-12-23      160         10     88  0.793033  69.786942
 2006-11-09      204          9     52  0.636655  33.106077
 2006-10-22      222          8     66  0.581946  38.408408
 2006-09-29      245          9     70  0.518825  36.317752
 2006-09-16      258         11     68  0.486226  33.063381
 2006-08-30      275          8     72  0.446667  32.160051
 2006-02-11      475          5     65  0.164591  10.698423
 2006-01-13      504          0     70  0.142409   9.968634
 2006-01-02      515          0     64  0.134800   8.627219
 2005-12-06      542          0     70  0.117803   8.246238
 2005-11-29      549          0     70  0.113758   7.963072
 2005-11-22      556          0     -1  0.109852  -0.109852
 2005-11-01      577          0     -1  0.098919  -0.098919
 2005-10-20      589          0     -1  0.093168  -0.093168
 2005-09-27      612          0     -1  0.083063  -0.083063
 2005-09-07      632          0     -1  0.075171  -0.075171
 2005-06-12      719          0     69  0.048690   3.359623
 2005-05-29      733          0     -1  0.045404  -0.045404
 2005-05-02      760          0     -1  0.039679  -0.039679
 2005-04-02      790          0     -1  0.034160  -0.034160
 2005-03-13      810          0     -1  0.030915  -0.030915
 2004-11-09      934          0     -1  0.016647  -0.016647

I need to remove the rows where line_raceis equal to 0. What's the most efficient way to do this?

我需要删除line_race等于的行0。执行此操作的最有效方法是什么?

采纳答案by tshauck

If I'm understanding correctly, it should be as simple as:

如果我理解正确,它应该很简单:

df = df[df.line_race != 0]

回答by Phillip Cloud

The best way to do this is with boolean masking:

最好的方法是使用布尔掩码:

In [56]: df
Out[56]:
     line_date  daysago  line_race  rating    raw  wrating
0   2007-03-31       62         11      56  1.000   56.000
1   2007-03-10       83         11      67  1.000   67.000
2   2007-02-10      111          9      66  1.000   66.000
3   2007-01-13      139         10      83  0.881   73.096
4   2006-12-23      160         10      88  0.793   69.787
5   2006-11-09      204          9      52  0.637   33.106
6   2006-10-22      222          8      66  0.582   38.408
7   2006-09-29      245          9      70  0.519   36.318
8   2006-09-16      258         11      68  0.486   33.063
9   2006-08-30      275          8      72  0.447   32.160
10  2006-02-11      475          5      65  0.165   10.698
11  2006-01-13      504          0      70  0.142    9.969
12  2006-01-02      515          0      64  0.135    8.627
13  2005-12-06      542          0      70  0.118    8.246
14  2005-11-29      549          0      70  0.114    7.963
15  2005-11-22      556          0      -1  0.110   -0.110
16  2005-11-01      577          0      -1  0.099   -0.099
17  2005-10-20      589          0      -1  0.093   -0.093
18  2005-09-27      612          0      -1  0.083   -0.083
19  2005-09-07      632          0      -1  0.075   -0.075
20  2005-06-12      719          0      69  0.049    3.360
21  2005-05-29      733          0      -1  0.045   -0.045
22  2005-05-02      760          0      -1  0.040   -0.040
23  2005-04-02      790          0      -1  0.034   -0.034
24  2005-03-13      810          0      -1  0.031   -0.031
25  2004-11-09      934          0      -1  0.017   -0.017

In [57]: df[df.line_race != 0]
Out[57]:
     line_date  daysago  line_race  rating    raw  wrating
0   2007-03-31       62         11      56  1.000   56.000
1   2007-03-10       83         11      67  1.000   67.000
2   2007-02-10      111          9      66  1.000   66.000
3   2007-01-13      139         10      83  0.881   73.096
4   2006-12-23      160         10      88  0.793   69.787
5   2006-11-09      204          9      52  0.637   33.106
6   2006-10-22      222          8      66  0.582   38.408
7   2006-09-29      245          9      70  0.519   36.318
8   2006-09-16      258         11      68  0.486   33.063
9   2006-08-30      275          8      72  0.447   32.160
10  2006-02-11      475          5      65  0.165   10.698

UPDATE:Now that pandas 0.13 is out, another way to do this is df.query('line_race != 0').

更新:现在 pandas 0.13 已经发布,另一种方法是df.query('line_race != 0').

回答by wonderkid2

But for any future bypassers you could mention that df = df[df.line_race != 0]doesn't do anything when trying to filter for None/missing values.

但是对于任何未来的绕过者,您可以提到df = df[df.line_race != 0]在尝试过滤None/missing 值时不会做任何事情。

Does work:

是否有效:

df = df[df.line_race != 0]

Doesn't do anything:

什么都不做:

df = df[df.line_race != None]

Does work:

是否有效:

df = df[df.line_race.notnull()]

回答by h3h325

The given answer is correct nontheless as someone above said you can use df.query('line_race != 0')which depending on your problem is much faster. Highly recommend.

尽管如此,给定的答案是正确的,因为上面有人说您可以使用df.query('line_race != 0')它,这取决于您的问题要快得多。强烈推荐。

回答by desmond

just to add another solution, particularly useful if you are using the new pandas assessors, other solutions will replace the original pandas and lose the assessors

只是添加另一个解决方案,如果您正在使用新的 pandas 评估器,则特别有用,其他解决方案将替换原始 Pandas 并失去评估器

df.drop(df.loc[df['line_race']==0].index, inplace=True)

回答by Amruth Lakkavaram

Another way of doing it. May not be the most efficient way as the code looks a bit more complex than the code mentioned in other answers, but still alternate way of doing the same thing.

另一种方法。可能不是最有效的方式,因为代码看起来比其他答案中提到的代码复杂一些,但仍然可以替代做同样的事情。

  df = df.drop(df[df['line_race']==0].index)

回答by Loochie

Though the previou answer are almost similar to what I am going to do, but using the index method does not require using another indexing method .loc(). It can be done in a similar but precise manner as

虽然前面的答案几乎与我将要做的相似,但是使用 index 方法不需要使用另一个索引方法 .loc()。它可以以类似但精确的方式完成

df.drop(df.index[df['line_race'] == 0], inplace = True)

回答by Robvh

If you want to delete rows based on multiple values of the column, you could use:

如果要根据列的多个值删除行,可以使用:

df[(df.line_race != 0) & (df.line_race != 10)]

To drop all rows with values 0 and 10 for line_race.

删除所有值为 0 和 10 的行line_race

回答by Prateek Kumar Singh

Just adding another way for DataFrame expanded over all columns:

只需为 DataFrame 添加另一种扩展到所有列的方法:

for column in df.columns:
   df = df[df[column]!=0]

Example:

例子:

def z_score(data,count):
   threshold=3
   for column in data.columns:
       mean = np.mean(data[column])
       std = np.std(data[column])
       for i in data[column]:
           zscore = (i-mean)/std
           if(np.abs(zscore)>threshold):
               count=count+1
               data = data[data[column]!=i]
   return data,count