Pandas 中日期列的最大值/最小值，列包含 nan 值

Question

提问by Nieumysl

I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:

我正在尝试在 Pandas 数据框中创建一个新列，其中包含来自其他两个日期列的最大（或最小）日期。但是，当其中任一列中的任何位置都有 NAN 时，整个 min/max 列都会变成 NAN。是什么赋予了？使用数字列时，这很好用……但是对于日期，新列都是 NAN。下面是一些示例代码来说明问题：

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?

我知道它可以通过 loc 和 <、>、isnull 等的组合来完成。但是如何使它与常规的最大/最小函数一起工作？

Answer 1

回答by EdChum

You're storing dateobjects in your columns, if you convert to datetimethen it works as expected:

您将date对象存储在列中，如果转换为，datetime则它按预期工作：

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

If you simply did:

如果你只是这样做：

df['col3'] = df['col1'].max()

this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

这引起了一个 TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

The NaNvalues cause the dtypeto be promoted to floatso NaNgets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtypeto datetimeso that the missing values are converted to NaTso that maxworks correctly

这些NaN值导致dtype被提升，float因此NaN被返回。如果您没有缺失值，那么它会按预期工作，如果您有缺失值，那么您应该将转换为dtype，datetime以便将缺失值转换为NaT以便max正常工作

Pandas 中日期列的最大值/最小值，列包含 nan 值

提问by Nieumysl

回答by EdChum

相关推荐

最近更新

标签

Pandas 中日期列的最大值/最小值，列包含 nan 值

提问by Nieumysl

回答by EdChum

相关推荐

pandas 如何从数据帧创建键：列名和值的字典：python 列中的唯一值

Python Pandas 基于列计算行数

pandas 引入条件时不能使用 fillna

如何从 Pandas 数据框中特定列中的所有值中删除所有非数字字符？

相关推荐

最近更新

标签