python 相当于 R 的 NA 是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28654325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is python's equivalent of R's NA?
提问by power
What is python's equivalent of R's NA?
python 相当于 R 的 NA 是什么?
To be more specific: R has NaN, NA, NULL, Inf and -Inf. NA is generally used when there is missing data. What is python's equivalent?
更具体地说:R 有 NaN、NA、NULL、Inf 和 -Inf。NA 一般在数据缺失时使用。python的等价物是什么?
How libraries such as numpy and pandas handle missing values?
numpy 和 pandas 等库如何处理缺失值?
How does scikit-learn handle missing values?
scikit-learn 如何处理缺失值?
Is it different for python 2.7 and python 3?
python 2.7和python 3有什么不同吗?
采纳答案by Andreas Mueller
Scikit-learn doesn't handle missing values currently. For most machine learning algorithms, it is unclear how to handle missing values, and so we rely on the user of handling them prior to giving them to the algorithm. Numpy doesn't have a "missing" value. Pandas uses NaN, but inside numeric algorithms that might lead to confusion. It is possible to use masked arrays, but we don't do that in scikit-learn (yet).
Scikit-learn 目前不处理缺失值。对于大多数机器学习算法,不清楚如何处理缺失值,因此我们依赖用户在将它们提供给算法之前处理它们。Numpy 没有“缺失”值。Pandas 使用 NaN,但在数字算法中可能会导致混淆。可以使用掩码数组,但我们在 scikit-learn 中还没有这样做。
回答by JAB
for pandastake a look at this.
为pandas看看这个。
http://pandas.pydata.org/pandas-docs/dev/missing_data.html
http://pandas.pydata.org/pandas-docs/dev/missing_data.html
pandas uses NaN. You can test for null values using isnull()or not null(), drop them from a data frame using dropna()etc. The equivalent for datetimeobjects is NaT
熊猫使用NaN. 您可以使用isnull()或测试空值not null(),使用dropna()等将它们从数据框中删除。datetime对象的等效项是NaT
回答by N1B4
nanin numpy is handled well with many functions:
nan在 numpy 中可以很好地处理许多功能:
>>> import numpy as np
>>> a = [1, np.nan, 2, 3]
>>> np.nanmean(a)
2.0
>>> np.nansum(a)
6.0
>>> np.isnan(a)
array([False, True, False, False], dtype=bool)

