获取列中的非数字行 pandas python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44140489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:40:35  来源:igfitidea点击:

get non numerical rows in a column pandas python

pythonpandas

提问by Jessica

I checked this post: finding non-numeric rows in dataframe in pandas?but it doesn't really answer my question.

我查看了这篇文章:在 Pandas 的数据框中查找非数字行?但它并没有真正回答我的问题。

my sample data:

我的样本数据:

import pandas as pd


d = {
 'unit': ['UD', 'UD', 'UD', 'UD', 'UD','UD'],
 'N-D': [ 'Q1', 'Q2', 'Q3', 'Q4','Q5','Q6'],
 'num' : [ -1.48, 1.7, -6.18, 0.25, 'sum(d)', 0.25]

}
df = pd.DataFrame(d)

it looks like this:

它看起来像这样:

  N-D   num   unit
0  Q1  -1.48   UD
1  Q2   1.70   UD
2  Q3  -6.18   UD
3  Q4   0.25   UD
4  Q5   sum(d) UD
5  Q6   0.25   UD

I want to filter out only the rows in column 'num' that are NON-NUMERIC. I want all of the columns for only the rows that contain non-numeric values for column 'num'.

我只想过滤掉“num”列中非数字的行。我只想要包含列“num”的非数字值的行的所有列。

desired output:

所需的输出:

  N-D   num   unit
4  Q5   sum(d) UD

my attempts:

我的尝试:

nonnumeric=df[~df.applymap(np.isreal).all(1)] #didn't work, it pulled out everything, besides i want the condition to check only column 'num'. 

nonnumeric=df['num'][~df.applymap(np.isreal).all(1)] #didn't work, it pulled out all the rows for column 'num' only.

回答by jezrael

Use boolean indexingwith mask created by to_numeric+ isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'

boolean indexing与由to_numeric+创建的掩码一起使用 注意:此解决方案不会查找或过滤保存为字符串的数字:如“1”或“22”isnull

print (pd.to_numeric(df['num'], errors='coerce'))
0   -1.48
1    1.70
2   -6.18
3    0.25
4     NaN
5    0.25
Name: num, dtype: float64

print (pd.to_numeric(df['num'], errors='coerce').isnull())
0    False
1    False
2    False
3    False
4     True
5    False
Name: num, dtype: bool

print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])
  N-D     num unit
4  Q5  sum(d)   UD

Another solution with isinstanceand apply:

使用isinstance和的另一种解决方案apply

print (df[df['num'].apply(lambda x: isinstance(x, str))])
  N-D     num unit
4  Q5  sum(d)   UD

回答by Mehdi

Old topic, but if the numbers have been converted to 'str', type(x) == stris not working.

旧主题,但如果数字已转换为“str”,则type(x) == str不起作用。

Instead, it's better to use isnumeric() or isdigit().

相反,最好使用 isnumeric() 或 isdigit()。

df = df[df['num'].apply(lambda x: not x.isnumeric())]

I tested all three approaches on my own dataframe with 200k+ rows, assuming numbers have been converted to 'str' by pd.read_csv().

假设数字已被 pd.read_csv() 转换为“str”,我在自己的数据帧上测试了所有这三种方法,其中包含 200k+ 行。

def f1():
    df[pd.to_numeric(df['num'], errors='coerce').isnull()]

def f2():
    df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

def f3():
    df[df['num'].apply(lambda x: not x.isnumeric())]

I got following execution times by running each function 10 times.

通过运行每个函数 10 次,我得到了以下执行时间。

timeit.timeit(f1, number=10)
1.04128568888882

timeit.timeit(f2, number=10)
1.959099448888992

timeit.timeit(f3, number=10)
0.48741375999998127

Conculsion:fastestmethod is isnumeric(), slowestis regular expression method.

结论:最快的方法是isnumeric(),最慢的是正则表达式方法。

回答by plasmon360

I used

我用了

df = df[df['num'].apply(lambda x: type(x) == str)]

and now df is

现在 df 是

  N-D     num unit
4  Q5  sum(d)   UD

回答by piRSquared

Assuming these are strings, you can filter based on a regular expression match of a floating point number.

假设这些是字符串,您可以根据浮点数的正则表达式匹配进行过滤。

df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

  N-D     num unit
4  Q5  sum(d)   UD

回答by ravibeli

There are many ways to detect non-numeric values in the column of pandas DataFrame, here is one.

检测pandas DataFrame列中的非数字值的方法有很多种,这里是一种。

df[~df['num'].map(lambda x:x.isnumeric())]