Pandas 中日期列的最大值/最小值,列包含 nan 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44304419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Max / Min of date column in Pandas, columns include nan values
提问by Nieumysl
I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:
我正在尝试在 Pandas 数据框中创建一个新列,其中包含来自其他两个日期列的最大(或最小)日期。但是,当其中任一列中的任何位置都有 NAN 时,整个 min/max 列都会变成 NAN。是什么赋予了?使用数字列时,这很好用……但是对于日期,新列都是 NAN。下面是一些示例代码来说明问题:
df = pd.DataFrame(data=[[np.nan,date(2000,11,1)],
[date(2000,12,1), date(2000,9,1)],
[date(2000,4,1),np.nan],
[date(2000,12,2),np.nan]], columns=['col1','col2'])
df['col3'] = df[['col1','col2']].max(axis=1)
I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?
我知道它可以通过 loc 和 <、>、isnull 等的组合来完成。但是如何使它与常规的最大/最小函数一起工作?
回答by EdChum
You're storing date
objects in your columns, if you convert to datetime
then it works as expected:
您将date
对象存储在列中,如果转换为,datetime
则它按预期工作:
In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df
Out[10]:
col1 col2 col3
0 NaT 2000-11-01 NaN
1 2000-12-01 2000-09-01 NaN
2 2000-04-01 NaT NaN
3 2000-12-02 NaT NaN
In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df
Out[11]:
col1 col2 col3
0 NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01 NaT 2000-04-01
3 2000-12-02 NaT 2000-12-02
If you simply did:
如果你只是这样做:
df['col3'] = df['col1'].max()
this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'
这引起了一个 TypeError: '>=' not supported between instances of 'float' and 'datetime.date'
The NaN
values cause the dtype
to be promoted to float
so NaN
gets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtype
to datetime
so that the missing values are converted to NaT
so that max
works correctly
这些NaN
值导致dtype
被提升,float
因此NaN
被返回。如果您没有缺失值,那么它会按预期工作,如果您有缺失值,那么您应该将 转换为dtype
,datetime
以便将缺失值转换为NaT
以便max
正常工作