pandas 比较不同长度的熊猫数据帧

Question

提问by Simon

I have two dataframes of different lengths both indexed by date. I need both dataframes to have the same dates, ie. delete the extra entries in the longest dataframe.

我有两个不同长度的数据框，它们都按日期索引。我需要两个数据框具有相同的日期，即。删除最长数据帧中的额外条目。

I have found that I can reset index and make it another another column then call that column as a pandas dataseries and compare to the other data series, giving me a pandas series with only the entries that are also in the shorter dataframe:

我发现我可以重置索引并使其成为另一列，然后将该列称为Pandas数据系列并与其他数据系列进行比较，从而为我提供一个Pandas系列，其中仅包含也在较短数据框中的条目：

df1 = ...
df2 = ...
dfadj = df1.reset_index(['Date'])
dfstock = dfadj['Date'][dfadj['Date'].isin(dfindex['Date'])]

But then I would need to find the index positions from these values and in another step delete it from the longest dataframe. Am I missing a completely different approch which would be more logical and/or simple?

但随后我需要从这些值中找到索引位置，并在另一个步骤中将其从最长的数据帧中删除。我是否错过了一种更合乎逻辑和/或更简单的完全不同的方法？

Answer 1

回答by jezrael

You can use Index.intersectionand then select data in df2by ix:

您可以使用Index.intersection然后df2通过ix以下方式选择数据：

idx = df2.index.intersection(df1.index)
print (idx)
DatetimeIndex(['2015-02-24', '2015-02-25', '2015-02-26', '2015-02-27',
               '2015-02-28', '2015-03-01', '2015-03-02', '2015-03-03',
               '2015-03-04', '2015-03-05'],
              dtype='datetime64[ns]', freq='D')

print (df2.ix[idx])
             b
2015-02-24  10
2015-02-25  11
2015-02-26  12
2015-02-27  13
2015-02-28  14
2015-03-01  15
2015-03-02  16
2015-03-03  17
2015-03-04  18
2015-03-05  19

Another solution is use mergewith inner join, what is by deafult, so can be omited how='inner':

另一种解决方案是merge与内连接一起使用，默认情况下，可以省略how='inner'：

df = pd.merge(df1,df2, left_index=True, right_index=True)

Sample:

样本：

rng1 = pd.date_range(pd.to_datetime('2015-02-24'), periods=10)
df1 = pd.DataFrame({'a': range(10)}, index=rng1)   
print (df1)
            a
2015-02-24  0
2015-02-25  1
2015-02-26  2
2015-02-27  3
2015-02-28  4
2015-03-01  5
2015-03-02  6
2015-03-03  7
2015-03-04  8
2015-03-05  9

rng2 = pd.date_range(pd.to_datetime('2015-02-24'), periods=20)
df2 = pd.DataFrame({'b': range(10,30)}, index=rng2)  
print (df2)
            b
2015-02-24  10
2015-02-25  11
2015-02-26  12
2015-02-27  13
2015-02-28  14
2015-03-01  15
2015-03-02  16
2015-03-03  17
2015-03-04  18
2015-03-05  19
2015-03-06  20
2015-03-07  21
2015-03-08  22
2015-03-09  23
2015-03-10  24
2015-03-11  25
2015-03-12  26
2015-03-13  27
2015-03-14  28
2015-03-15  29

df = pd.merge(df1,df2, left_index=True, right_index=True)
print (df)
            a   b
2015-02-24  0  10
2015-02-25  1  11
2015-02-26  2  12
2015-02-27  3  13
2015-02-28  4  14
2015-03-01  5  15
2015-03-02  6  16
2015-03-03  7  17
2015-03-04  8  18
2015-03-05  9  19

Last if need delete some columns use drop:

最后如果需要删除一些列使用drop：

print (df.drop(['a'], axis=1))
             b
2015-02-24  10
2015-02-25  11
2015-02-26  12
2015-02-27  13
2015-02-28  14
2015-03-01  15
2015-03-02  16
2015-03-03  17
2015-03-04  18
2015-03-05  19

pandas 比较不同长度的熊猫数据帧

提问by Simon

回答by jezrael

相关推荐

最近更新

标签

pandas 比较不同长度的熊猫数据帧

提问by Simon

回答by jezrael

相关推荐

pandas 如何根据条目的长度过滤熊猫数据框

pandas 如何用dask映射一列

在 Pandas 中，当使用 read_csv() 时，如何将 NaN 分配给不是 dtype 预期的值？

将 Pandas DataFrame 和 xaxis 绘制为 Timestamp 生成空图

相关推荐

最近更新

标签