获取列中的非数字行 pandas python

Question

提问by Jessica

I checked this post: finding non-numeric rows in dataframe in pandas?but it doesn't really answer my question.

我查看了这篇文章：在 Pandas 的数据框中查找非数字行？但它并没有真正回答我的问题。

my sample data:

我的样本数据：

import pandas as pd


d = {
 'unit': ['UD', 'UD', 'UD', 'UD', 'UD','UD'],
 'N-D': [ 'Q1', 'Q2', 'Q3', 'Q4','Q5','Q6'],
 'num' : [ -1.48, 1.7, -6.18, 0.25, 'sum(d)', 0.25]

}
df = pd.DataFrame(d)

it looks like this:

它看起来像这样：

  N-D   num   unit
0  Q1  -1.48   UD
1  Q2   1.70   UD
2  Q3  -6.18   UD
3  Q4   0.25   UD
4  Q5   sum(d) UD
5  Q6   0.25   UD

I want to filter out only the rows in column 'num' that are NON-NUMERIC. I want all of the columns for only the rows that contain non-numeric values for column 'num'.

我只想过滤掉“num”列中非数字的行。我只想要包含列“num”的非数字值的行的所有列。

desired output:

所需的输出：

  N-D   num   unit
4  Q5   sum(d) UD

my attempts:

我的尝试：

nonnumeric=df[~df.applymap(np.isreal).all(1)] #didn't work, it pulled out everything, besides i want the condition to check only column 'num'. 

nonnumeric=df['num'][~df.applymap(np.isreal).all(1)] #didn't work, it pulled out all the rows for column 'num' only.

Answer 1

回答by jezrael

Use boolean indexingwith mask created by to_numeric+ isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'

boolean indexing与由to_numeric+创建的掩码一起使用注意：此解决方案不会查找或过滤保存为字符串的数字：如“1”或“22”isnull

print (pd.to_numeric(df['num'], errors='coerce'))
0   -1.48
1    1.70
2   -6.18
3    0.25
4     NaN
5    0.25
Name: num, dtype: float64

print (pd.to_numeric(df['num'], errors='coerce').isnull())
0    False
1    False
2    False
3    False
4     True
5    False
Name: num, dtype: bool

print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])
  N-D     num unit
4  Q5  sum(d)   UD

Another solution with isinstanceand apply:

使用isinstance和的另一种解决方案apply：

print (df[df['num'].apply(lambda x: isinstance(x, str))])
  N-D     num unit
4  Q5  sum(d)   UD

Answer 2

回答by Mehdi

Old topic, but if the numbers have been converted to 'str', type(x) == stris not working.

旧主题，但如果数字已转换为“str”，则type(x) == str不起作用。

Instead, it's better to use isnumeric() or isdigit().

相反，最好使用 isnumeric() 或 isdigit()。

df = df[df['num'].apply(lambda x: not x.isnumeric())]

I tested all three approaches on my own dataframe with 200k+ rows, assuming numbers have been converted to 'str' by pd.read_csv().

假设数字已被 pd.read_csv() 转换为“str”，我在自己的数据帧上测试了所有这三种方法，其中包含 200k+ 行。

def f1():
    df[pd.to_numeric(df['num'], errors='coerce').isnull()]

def f2():
    df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

def f3():
    df[df['num'].apply(lambda x: not x.isnumeric())]

I got following execution times by running each function 10 times.

通过运行每个函数 10 次，我得到了以下执行时间。

timeit.timeit(f1, number=10)
1.04128568888882

timeit.timeit(f2, number=10)
1.959099448888992

timeit.timeit(f3, number=10)
0.48741375999998127

Conculsion:fastestmethod is isnumeric(), slowestis regular expression method.

结论：最快的方法是isnumeric()，最慢的是正则表达式方法。

Answer 3

回答by plasmon360

I used

我用了

df = df[df['num'].apply(lambda x: type(x) == str)]

and now df is

现在 df 是

  N-D     num unit
4  Q5  sum(d)   UD

Answer 4

回答by piRSquared

Assuming these are strings, you can filter based on a regular expression match of a floating point number.

假设这些是字符串，您可以根据浮点数的正则表达式匹配进行过滤。

df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

  N-D     num unit
4  Q5  sum(d)   UD

Answer 5

回答by ravibeli

There are many ways to detect non-numeric values in the column of pandas DataFrame, here is one.

检测pandas DataFrame列中的非数字值的方法有很多种，这里是一种。

df[~df['num'].map(lambda x:x.isnumeric())]

获取列中的非数字行 pandas python

提问by Jessica

回答by jezrael

回答by Mehdi

回答by plasmon360

回答by piRSquared

回答by ravibeli

相关推荐

最近更新

标签

获取列中的非数字行 pandas python

提问by Jessica

回答by jezrael

回答by Mehdi

回答by plasmon360

回答by piRSquared

回答by ravibeli

相关推荐

导入错误：没有使用 Ubuntu 的名为“pandas”的模块

Pandas：摆脱多索引

Python Pandas 并排绘制两个 BARH

pandas 如何将数据帧拆分为多个数据帧，其中每个数据帧包含相等但随机的数据

相关推荐

最近更新

标签