Pandas 中日期列的最大值/最小值,列包含 nan 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44304419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:42:48  来源:igfitidea点击:

Max / Min of date column in Pandas, columns include nan values

pythondatepandasdataframe

提问by Nieumysl

I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:

我正在尝试在 Pandas 数据框中创建一个新列,其中包含来自其他两个日期列的最大(或最小)日期。但是,当其中任一列中的任何位置都有 NAN 时,整个 min/max 列都会变成 NAN。是什么赋予了?使用数字列时,这很好用……但是对于日期,新列都是 NAN。下面是一些示例代码来说明问题:

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?

我知道它可以通过 loc 和 <、>、isnull 等的组合来完成。但是如何使它与常规的最大/最小函数一起工作?

回答by EdChum

You're storing dateobjects in your columns, if you convert to datetimethen it works as expected:

您将date对象存储在列中,如果转换为,datetime则它按预期工作:

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

If you simply did:

如果你只是这样做:

df['col3'] = df['col1'].max()

this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

这引起了一个 TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

The NaNvalues cause the dtypeto be promoted to floatso NaNgets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtypeto datetimeso that the missing values are converted to NaTso that maxworks correctly

这些NaN值导致dtype被提升,float因此NaN被返回。如果您没有缺失值,那么它会按预期工作,如果您有缺失值,那么您应该将 转换为dtypedatetime以便将缺失值转换为NaT以便max正常工作