pandas 比较不同长度的熊猫数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40131281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Comparing pandas dataframes of different length
提问by Simon
I have two dataframes of different lengths both indexed by date. I need both dataframes to have the same dates, ie. delete the extra entries in the longest dataframe.
我有两个不同长度的数据框,它们都按日期索引。我需要两个数据框具有相同的日期,即。删除最长数据帧中的额外条目。
I have found that I can reset index and make it another another column then call that column as a pandas dataseries and compare to the other data series, giving me a pandas series with only the entries that are also in the shorter dataframe:
我发现我可以重置索引并使其成为另一列,然后将该列称为Pandas数据系列并与其他数据系列进行比较,从而为我提供一个Pandas系列,其中仅包含也在较短数据框中的条目:
df1 = ...
df2 = ...
dfadj = df1.reset_index(['Date'])
dfstock = dfadj['Date'][dfadj['Date'].isin(dfindex['Date'])]
But then I would need to find the index positions from these values and in another step delete it from the longest dataframe. Am I missing a completely different approch which would be more logical and/or simple?
但随后我需要从这些值中找到索引位置,并在另一个步骤中将其从最长的数据帧中删除。我是否错过了一种更合乎逻辑和/或更简单的完全不同的方法?
回答by jezrael
You can use Index.intersection
and then select data in df2
by ix
:
您可以使用Index.intersection
然后df2
通过ix
以下方式选择数据:
idx = df2.index.intersection(df1.index)
print (idx)
DatetimeIndex(['2015-02-24', '2015-02-25', '2015-02-26', '2015-02-27',
'2015-02-28', '2015-03-01', '2015-03-02', '2015-03-03',
'2015-03-04', '2015-03-05'],
dtype='datetime64[ns]', freq='D')
print (df2.ix[idx])
b
2015-02-24 10
2015-02-25 11
2015-02-26 12
2015-02-27 13
2015-02-28 14
2015-03-01 15
2015-03-02 16
2015-03-03 17
2015-03-04 18
2015-03-05 19
Another solution is use merge
with inner join, what is by deafult, so can be omited how='inner'
:
另一种解决方案是merge
与内连接一起使用,默认情况下,可以省略how='inner'
:
df = pd.merge(df1,df2, left_index=True, right_index=True)
Sample:
样本:
rng1 = pd.date_range(pd.to_datetime('2015-02-24'), periods=10)
df1 = pd.DataFrame({'a': range(10)}, index=rng1)
print (df1)
a
2015-02-24 0
2015-02-25 1
2015-02-26 2
2015-02-27 3
2015-02-28 4
2015-03-01 5
2015-03-02 6
2015-03-03 7
2015-03-04 8
2015-03-05 9
rng2 = pd.date_range(pd.to_datetime('2015-02-24'), periods=20)
df2 = pd.DataFrame({'b': range(10,30)}, index=rng2)
print (df2)
b
2015-02-24 10
2015-02-25 11
2015-02-26 12
2015-02-27 13
2015-02-28 14
2015-03-01 15
2015-03-02 16
2015-03-03 17
2015-03-04 18
2015-03-05 19
2015-03-06 20
2015-03-07 21
2015-03-08 22
2015-03-09 23
2015-03-10 24
2015-03-11 25
2015-03-12 26
2015-03-13 27
2015-03-14 28
2015-03-15 29
df = pd.merge(df1,df2, left_index=True, right_index=True)
print (df)
a b
2015-02-24 0 10
2015-02-25 1 11
2015-02-26 2 12
2015-02-27 3 13
2015-02-28 4 14
2015-03-01 5 15
2015-03-02 6 16
2015-03-03 7 17
2015-03-04 8 18
2015-03-05 9 19
Last if need delete some columns use drop
:
最后如果需要删除一些列使用drop
:
print (df.drop(['a'], axis=1))
b
2015-02-24 10
2015-02-25 11
2015-02-26 12
2015-02-27 13
2015-02-28 14
2015-03-01 15
2015-03-02 16
2015-03-03 17
2015-03-04 18
2015-03-05 19