pandas 在熊猫数据框中的任何列中删除带有“问号”值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35682719/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:47:22  来源:igfitidea点击:

Drop rows with a 'question mark' value in any column in a pandas dataframe

pythonpandasdataframe

提问by Anonymous

I want to remove all rows (or take all rows without) a question mark symbol in any column. I also want to change the elements to floattype.

我想删除任何列中的所有行(或删除所有行)一个问号符号。我也想将元素更改为浮动类型。

Input:

输入:

X Y Z
0 1 ?
1 2 3
? ? 4
4 4 4
? 2 5

Output:

输出:

X Y Z
1 2 3
4 4 4

Preferably using pandas dataframe operations.

最好使用Pandas数据框操作。

回答by jezrael

You can try first find string ?in columns, create boolean mask and last filter rows - use boolean indexing. If you need convert columns to float, use astype:

您可以尝试首先?在列中查找字符串,创建布尔掩码并最后过滤行 - 使用布尔索引。如果您需要将列转换为float,请使用astype

print ~((df['X'] == '?' )  (df['Y'] == '?' ) | (df['Z'] == '?' ))
0    False
1     True
2    False
3     True
4    False
dtype: bool


df1 = df[~((df['X'] == '?' ) | (df['Y'] == '?' ) | (df['Z'] == '?' ))].astype(float)
print df1
   X  Y  Z
1  1  2  3
3  4  4  4

print df1.dtypes
X    float64
Y    float64
Z    float64
dtype: object

Or you can try:

或者你可以试试:

df['X'] = pd.to_numeric(df['X'], errors='coerce')
df['Y'] = pd.to_numeric(df['Y'], errors='coerce')
df['Z'] = pd.to_numeric(df['Z'], errors='coerce')
print df
    X   Y   Z
0   0   1 NaN
1   1   2   3
2 NaN NaN   4
3   4   4   4
4 NaN   2   5
print ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() ))
0    False
1     True
2    False
3     True
4    False
dtype: bool

print df[ ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) ].astype(float)
   X  Y  Z
1  1  2  3
3  4  4  4

Better is use:

更好的是使用:

df = df[(df != '?').all(axis=1)]

Or:

或者:

df = df[~(df == '?').any(axis=1)]

回答by Naidu Jithendra

You can try replacing ?with null values

您可以尝试用?空值替换

import numpy as np

data = df.replace("?", "np.Nan")

if you want to replace particular column try this:

如果要替换特定列,请尝试以下操作:

data = df["column name"].replace("?", "np.Nan")