python 相当于 R 的 NA 是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28654325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:33:44  来源:igfitidea点击:

What is python's equivalent of R's NA?

pythonnumpypandasscikit-learndata-scrubbing

提问by power

What is python's equivalent of R's NA?

python 相当于 R 的 NA 是什么?

To be more specific: R has NaN, NA, NULL, Inf and -Inf. NA is generally used when there is missing data. What is python's equivalent?

更具体地说:R 有 NaN、NA、NULL、Inf 和 -Inf。NA 一般在数据缺失时使用。python的等价物是什么?

How libraries such as numpy and pandas handle missing values?

numpy 和 pandas 等库如何处理缺失值?

How does scikit-learn handle missing values?

scikit-learn 如何处理缺失值?

Is it different for python 2.7 and python 3?

python 2.7和python 3有什么不同吗?

采纳答案by Andreas Mueller

Scikit-learn doesn't handle missing values currently. For most machine learning algorithms, it is unclear how to handle missing values, and so we rely on the user of handling them prior to giving them to the algorithm. Numpy doesn't have a "missing" value. Pandas uses NaN, but inside numeric algorithms that might lead to confusion. It is possible to use masked arrays, but we don't do that in scikit-learn (yet).

Scikit-learn 目前不处理缺失值。对于大多数机器学习算法,不清楚如何处理缺失值,因此我们依赖用户在将它们提供给算法之前处理它们。Numpy 没有“缺失”值。Pandas 使用 NaN,但在数字算法中可能会导致混淆。可以使用掩码数组,但我们在 scikit-learn 中还没有这样做。

回答by JAB

for pandastake a look at this.

pandas看看这个。

http://pandas.pydata.org/pandas-docs/dev/missing_data.html

http://pandas.pydata.org/pandas-docs/dev/missing_data.html

pandas uses NaN. You can test for null values using isnull()or not null(), drop them from a data frame using dropna()etc. The equivalent for datetimeobjects is NaT

熊猫使用NaN. 您可以使用isnull()或测试空值not null(),使用dropna()等将它们从数据框中删除。datetime对象的等效项是NaT

回答by N1B4

nanin numpy is handled well with many functions:

nan在 numpy 中可以很好地处理许多功能:

>>> import numpy as np
>>> a = [1, np.nan, 2, 3]
>>> np.nanmean(a)
2.0
>>> np.nansum(a)
6.0
>>> np.isnan(a)
array([False,  True, False, False], dtype=bool)