Python 使用熊猫删除一列中的非数字行

Question

提问by HungUnicorn

There is a dataframe like the following, and it has one unclean column 'id' which it sholud be numeric column

有一个如下所示的数据框，它有一个不干净的列“id”，它应该是数字列

id, name
1,  A
2,  B
3,  C
tt, D
4,  E
5,  F
de, G

Is there a concise way to remove the rows because tt and de are not numeric values

是否有一种简洁的方法来删除行，因为 tt 和 de 不是数值

tt,D
de,G

to make the dataframe clean?

使数据框干净？

id, name
1,  A
2,  B
3,  C
4,  E
5,  F

Answer 1

采纳答案by Anton Protopopov

You could use standard method of strings isnumericand apply it to each value in your idcolumn:

您可以使用标准的字符串方法isnumeric并将其应用于id列中的每个值：

import pandas as pd
from io import StringIO

data = """
id,name
1,A
2,B
3,C
tt,D
4,E
5,F
de,G
"""

df = pd.read_csv(StringIO(data))

In [55]: df
Out[55]: 
   id name
0   1    A
1   2    B
2   3    C
3  tt    D
4   4    E
5   5    F
6  de    G

In [56]: df[df.id.apply(lambda x: x.isnumeric())]
Out[56]: 
  id name
0  1    A
1  2    B
2  3    C
4  4    E
5  5    F

Or if you want to use idas index you could do:

或者，如果您想id用作索引，您可以这样做：

In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id')
Out[61]: 
   name
id     
1     A
2     B
3     C
4     E
5     F

Edit. Add timings

编辑。添加时间

Although case with pd.to_numericis not using applymethod it is almost two times slower than with applying np.isnumericfor strcolumns. Also I add option with using pandas str.isnumericwhich is less typing and still faster then using pd.to_numeric. But pd.to_numericis more general because it could work with any data types (not only strings).

虽然情况下与pd.to_numeric未使用apply的方法，它比与施加慢几乎两倍np.isnumeric于str列。我还添加了使用熊猫的选项，str.isnumeric它比使用pd.to_numeric. 但pd.to_numeric更通用，因为它可以处理任何数据类型（不仅是字符串）。

df_big = pd.concat([df]*10000)

In [3]: df_big = pd.concat([df]*10000)

In [4]: df_big.shape
Out[4]: (70000, 2)

In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())]
15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit df_big[df_big.id.str.isnumeric()]
20.3 ms ± 171 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()]
29.9 ms ± 682 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

回答by DeepSpace

Given that dfis your dataframe,

鉴于这df是您的数据框，

import numpy as np
df[df['id'].apply(lambda x: isinstance(x, (int, np.int64)))]

What it does is passing each value in the idcolumn to the isinstancefunction and checks if it's an int. Then it returns a boolean array, and finally returning only the rows where there is True.

它所做的是将id列中的每个值传递给isinstance函数并检查它是否是int. 然后它返回一个布尔数组，最后只返回有的行True。

If you also need to account for floatvalues, another option is:

如果您还需要考虑float价值，另一种选择是：

import numpy as np
df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]

Note that either way is not inplace, so you will need to reassign it to your original df, or create a new one:

请注意，这两种方式都没有就位，因此您需要将其重新分配给原始 df，或创建一个新的：

df = df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]
# or
new_df = df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]

Answer 3

回答by Zero

Using pd.to_numeric

使用 pd.to_numeric

In [1079]: df[pd.to_numeric(df['id'], errors='coerce').notnull()]
Out[1079]:
  id  name
0  1     A
1  2     B
2  3     C
4  4     E
5  5     F

Answer 4

回答by Matphy

x.isnumeric()does not test return Truewhen xis of type float.

x.isnumeric()类型为True时不测试返回。xfloat

One way to filter out values which can be converted to float:

过滤掉可以转换为的值的一种方法float：

df[df['id'].apply(lambda x: is_float(x))]

def is_float(x):
    try:
        float(x)
    except ValueError:
        return False
    return True

Python 使用熊猫删除一列中的非数字行

提问by HungUnicorn

采纳答案by Anton Protopopov

Edit. Add timings

编辑。添加时间

回答by DeepSpace

回答by Zero

回答by Matphy

相关推荐

最近更新

标签

Python 使用熊猫删除一列中的非数字行

提问by HungUnicorn

采纳答案by Anton Protopopov

Edit. Add timings

编辑。添加时间

回答by DeepSpace

回答by Zero

回答by Matphy

相关推荐

Python 如何为 GradientDescentOptimizer 设置自适应学习率？

Python 尝试在 Mac OSX 小牛上安装 pycrypto

Python seaborn 在子图中生成单独的数字

Python 在两个 Pandas 数据框中查找公共行（交集）

相关推荐

最近更新

标签