pandas 如何检查浮动熊猫列是否只包含整数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49249860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:19:04  来源:igfitidea点击:

How to check if float pandas column contains only integer numbers?

pythonpandasfloating-pointprecision

提问by 00__00__00

I have a dataframe

我有一个数据框

df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)

How to make sure that the numbers in vare whole numbers? I am very concerned about rounding/truncation/floating point representation errors

如何确保中的数字v是整数?我非常关心舍入/截断/浮点表示错误

回答by cs95

Comparison with astype(int)

astype(int)

Tentatively convert your column to intand test with np.array_equal:

暂时将您的列转换为int并测试np.array_equal

np.array_equal(df.v, df.v.astype(int))
True


float.is_integer

float.is_integer

You can use this python function in conjunction with an apply:

您可以结合使用此 python 函数apply

df.v.apply(float.is_integer).all()
True

Or, using python's allin a generator comprehension, for space efficiency:

或者,all在生成器理解中使用 python以提高空间效率:

all(x.is_integer() for x in df.v)
True

回答by mgoldwasser

If you want to check multiple float columns in your dataframe, you can do the following:

如果要检查数据框中的多个浮点列,可以执行以下操作:

col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)

Keep in mind that a float column, containing all integers will not get selected if it has np.NaNvalues. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:

请记住,包含所有整数的浮点列如果具有np.NaN值则不会被选中。要将具有缺失值的浮点列转换为整数,您需要填充/删除缺失值,例如,使用中值插补:

float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)

回答by scott

Here's a simpler, and probably faster, approach:

这是一种更简单且可能更快的方法:

(df[col] % 1  == 0).all()

To ignore nulls:

忽略空值:

(df[col].fillna(-9999) % 1  == 0).all()