Pandas 中日期列的最大值/最小值,列包含 nan 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44304419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Max / Min of date column in Pandas, columns include nan values
提问by Nieumysl
I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:
我正在尝试在 Pandas 数据框中创建一个新列,其中包含来自其他两个日期列的最大(或最小)日期。但是,当其中任一列中的任何位置都有 NAN 时,整个 min/max 列都会变成 NAN。是什么赋予了?使用数字列时,这很好用……但是对于日期,新列都是 NAN。下面是一些示例代码来说明问题:
df = pd.DataFrame(data=[[np.nan,date(2000,11,1)],
[date(2000,12,1), date(2000,9,1)],
[date(2000,4,1),np.nan],
[date(2000,12,2),np.nan]], columns=['col1','col2'])
df['col3'] = df[['col1','col2']].max(axis=1)
I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?
我知道它可以通过 loc 和 <、>、isnull 等的组合来完成。但是如何使它与常规的最大/最小函数一起工作?
回答by EdChum
You're storing dateobjects in your columns, if you convert to datetimethen it works as expected:
您将date对象存储在列中,如果转换为,datetime则它按预期工作:
In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df
Out[10]:
col1 col2 col3
0 NaT 2000-11-01 NaN
1 2000-12-01 2000-09-01 NaN
2 2000-04-01 NaT NaN
3 2000-12-02 NaT NaN
In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df
Out[11]:
col1 col2 col3
0 NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01 NaT 2000-04-01
3 2000-12-02 NaT 2000-12-02
If you simply did:
如果你只是这样做:
df['col3'] = df['col1'].max()
this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'
这引起了一个 TypeError: '>=' not supported between instances of 'float' and 'datetime.date'
The NaNvalues cause the dtypeto be promoted to floatso NaNgets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtypeto datetimeso that the missing values are converted to NaTso that maxworks correctly
这些NaN值导致dtype被提升,float因此NaN被返回。如果您没有缺失值,那么它会按预期工作,如果您有缺失值,那么您应该将 转换为dtype,datetime以便将缺失值转换为NaT以便max正常工作

