比较 2 个不同的 Pandas 数据帧的 2 列,如果相同,则在 Python 中将 1 插入另一个
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19017350/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python
提问by knight2270
I have a panda DataFrame with date_time/voltage data like this (df1):
我有一个带有日期时间/电压数据的Pandas数据帧(df1):
Date_Time Chan
0 20130401 9:00 AAT
1 20130401 10:00 AAT
2 20130401 11:00 AAT
3 20130401 12:00 AAT
4 20130401 13:00 AAT
5 20130401 14:00 AAT
6 20130401 15:00 AAT
I am using this as a prototype to load in data from a much bigger data file and create one DataFrame . The other DataFrame looks like this (df2):
我使用它作为原型从更大的数据文件加载数据并创建一个 DataFrame 。另一个 DataFrame 看起来像这样 (df2):
Chan date_time Sens1 Sens2
AAC 01-Apr-2013 09:00 5.17 1281
AAC 01-Apr-2013 10:00 5.01 500
AAC 01-Apr-2013 12:00 5.17 100
AAC 01-Apr-2013 13:00 5.19 41997
AAC 01-Apr-2013 16:00 5.21 2123
AAT 01-Apr-2013 09:00 28.82 300
AAT 01-Apr-2013 10:00 28.35 4900
AAT 01-Apr-2013 12:00 28.04 250
AAE 01-Apr-2013 11:00 3.36 400
AAE 01-Apr-2013 12:00 3.41 200
AAE 01-Apr-2013 13:00 3.40 2388
AAE 01-Apr-2013 14:00 3.37 300
AAE 01-Apr-2013 15:00 3.35 500
AXN 01-Apr-2013 09:00 23.96 6643
AXN 01-Apr-2013 10:00 24.03 1000
AXW 01-Apr-2013 11:00 46.44 2343
So what I want to do is search df2 for all instances of a match from both columns of df1 (noting the different data formats) and insert the data from df2 into df1. Like this (df1)
所以我想要做的是从 df1 的两列中搜索 df2 的所有匹配实例(注意不同的数据格式),并将 df2 中的数据插入 df1。像这样(df1)
Date_Time Chan Sens1 Sens2
0 20130401 9:00 AAT 28.82 300
1 20130401 10:00 AAT 28.35 4900
2 20130401 11:00 AAT NaN NaN
3 20130401 12:00 AAT 28.04 250
4 20130401 13:00 AAT NaN NaN
5 20130401 14:00 AAT NaN NaN
6 20130401 15:00 AAT NaN NaN
Could you give me some suggestions for the python/pandas code to match this psuedocode:
你能给我一些关于 python/pandas 代码的建议来匹配这个伪代码:
if (df1['date_time'] = df2['date_time']) & (df1['Chan'] = df2['Chan'])):
df1['Sens1'] = df2['Sens1']
df1['Sens2'] = df2['Sens2']
If it effects the answer, it is my intention to bfill and ffill the NaNs and then add this DataFrame to a Panel and then repeat with another channel name in place of AAT.
如果它影响答案,我打算填充和填充 NaN,然后将此 DataFrame 添加到面板,然后用另一个通道名称代替 AAT 重复。
回答by Andy Hayden
You can use a plain ol' merge to do this. But first, you should do a little cleanup of you DataFrames, to make sure your datetime columns are actually datetimes rather than strings (Note: it may be better to do this when reading as csv or whatever):
您可以使用普通的 ol' 合并来执行此操作。但首先,您应该对 DataFrame 进行一些清理,以确保您的日期时间列实际上是日期时间而不是字符串(注意:在以 csv 或其他形式读取时这样做可能更好):
df1['Date_Time'] = pd.to_datetime(df1['Date_Time'], format='%Y%m%d %H:%M')
df2['date_time'] = pd.to_datetime(df2['date_time'])
Let's also rename the Datetime columns with the same name:
我们还重命名具有相同名称的日期时间列:
df1.rename(columns={'Date_Time': 'Datetime'}, inplace=True)
df2.rename(columns={'date_time': 'Datetime'}, inplace=True)
Now a simple merge will give you what you're after:
现在一个简单的合并会给你你所追求的:
In [11]: df1.merge(df2)
Out[11]:
Datetime Chan Sens1 Sens2
0 2013-04-01 09:00:00 AAT 28.82 300
1 2013-04-01 10:00:00 AAT 28.35 4900
2 2013-04-01 12:00:00 AAT 28.04 250
In [12]: df1.merge(df2, how='left')
Out[12]:
Datetime Chan Sens1 Sens2
0 2013-04-01 09:00:00 AAT 28.82 300
1 2013-04-01 10:00:00 AAT 28.35 4900
2 2013-04-01 11:00:00 AAT NaN NaN
3 2013-04-01 12:00:00 AAT 28.04 250
4 2013-04-01 13:00:00 AAT NaN NaN
5 2013-04-01 14:00:00 AAT NaN NaN
6 2013-04-01 15:00:00 AAT NaN NaN

