在 Pandas DF 行中查找最小日期并创建新列

Question

提问by dartdog

I have a table with a number of dates (some dates will be NaN) and I need to find the oldest date so a row may have DATE_MODIFIED, WITHDRAWN_DATE, SOLD_DATE, STATUS_DATE etc..

我有一个包含多个日期的表（有些日期是 NaN），我需要找到最旧的日期，所以一行可能有 DATE_MODIFIED、WITHDRAWN_DATE、SOLD_DATE、STATUS_DATE 等。

So for each row there will be a date in one or more of the fields I want to find the oldest of those and make a new column in the dataframe.

因此，对于每一行，我想找到其中最旧的一个或多个字段中的日期，并在数据框中创建一个新列。

Something like this, if I just do one , eg DATE MODIFIED I get a result but when I add the second as below

像这样的事情，如果我只做一个，例如 DATE MODIFIED 我会得到一个结果但是当我添加第二个时，如下所示

table['END_DATE']=min([table['DATE_MODIFIED']],[table['SOLD_DATE']])

I get:

我得到：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

For that matter will this construct work to find the min date, assuming I create correct date columns initially?

就此而言，假设我最初创建正确的日期列，此构造是否可以找到最小日期？

Answer 1

回答by Viktor Kerkez

Just apply the minfunction along the axis=1.

只需min沿轴 = 1应用该函数。

In [1]: import pandas as pd 
In [2]: df = pd.read_csv('test.cvs', parse_dates=['d1', 'd2', 'd3'])
In [3]: df.ix[2, 'd1'] = None
In [4]: df.ix[1, 'd2'] = None
In [5]: df.ix[4, 'd3'] = None
In [6]: df
Out[6]:
                   d1                  d2                  d3
0 2013-02-07 00:00:00 2013-03-08 00:00:00 2013-05-21 00:00:00
1 2013-02-07 00:00:00                 NaT 2013-05-21 00:00:00
2                 NaT 2013-03-02 00:00:00 2013-05-21 00:00:00
3 2013-02-04 00:00:00 2013-03-08 00:00:00 2013-01-04 00:00:00
4 2013-02-01 00:00:00 2013-03-06 00:00:00                 NaT
In [7]: df.min(axis=1)
Out[7]:
0   2013-02-07 00:00:00
1   2013-02-07 00:00:00
2   2013-03-02 00:00:00
3   2013-01-04 00:00:00
4   2013-02-01 00:00:00
dtype: datetime64[ns]

Answer 2

回答by Felix Zumstein

If tableis your DataFrame, then use its minmethod on the relevant columns:

如果table是您的 DataFrame，则min在相关列上使用其方法：

table['END_DATE'] = table[['DATE_MODIFIED','SOLD_DATE']].min(axis=1)

Answer 3

回答by fccoelho

A slight variation over Felix Zumstein's

与 Felix Zumstein 的略有不同

table['END_DATE'] = table[['DATE_MODIFIED','SOLD_DATE']].min(axis=1).astype('datetime64[ns]')

The astype('datetime64[ns]')is necessary in the current version of pandas (july 2015) to avoid getting a float64representation of the dates.

astype('datetime64[ns]')在当前版本的Pandas（2015 年 7 月）中，这是必要的，以避免获得float64日期的表示。

在 Pandas DF 行中查找最小日期并创建新列

提问by dartdog

回答by Viktor Kerkez

回答by Felix Zumstein

回答by fccoelho

相关推荐

最近更新

标签

在 Pandas DF 行中查找最小日期并创建新列

提问by dartdog

回答by Viktor Kerkez

回答by Felix Zumstein

回答by fccoelho

相关推荐

从 SciPy 稀疏矩阵填充 Pandas SparseDataFrame

在 HDF5 中存储 Pandas 对象和常规 Python 对象

在 Pandas 中连接列作为索引

计算不包含一些字符串 Pandas DataFrames 的行

相关推荐

最近更新

标签