比较 2 个不同的 Pandas 数据帧的 2 列,如果相同,则在 Python 中将 1 插入另一个

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19017350/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:11:39  来源:igfitidea点击:

Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python

python-3.xpandas

提问by knight2270

I have a panda DataFrame with date_time/voltage data like this (df1):

我有一个带有日期时间/电压数据的Pandas数据帧(df1):

        Date_Time  Chan
0   20130401 9:00   AAT
1  20130401 10:00   AAT
2  20130401 11:00   AAT
3  20130401 12:00   AAT
4  20130401 13:00   AAT
5  20130401 14:00   AAT
6  20130401 15:00   AAT

I am using this as a prototype to load in data from a much bigger data file and create one DataFrame . The other DataFrame looks like this (df2):

我使用它作为原型从更大的数据文件加载数据并创建一个 DataFrame 。另一个 DataFrame 看起来像这样 (df2):

Chan          date_time  Sens1  Sens2 
 AAC  01-Apr-2013 09:00   5.17   1281
 AAC  01-Apr-2013 10:00   5.01    500
 AAC  01-Apr-2013 12:00   5.17    100
 AAC  01-Apr-2013 13:00   5.19  41997
 AAC  01-Apr-2013 16:00   5.21   2123
 AAT  01-Apr-2013 09:00  28.82    300
 AAT  01-Apr-2013 10:00  28.35   4900
 AAT  01-Apr-2013 12:00  28.04    250
 AAE  01-Apr-2013 11:00   3.36    400
 AAE  01-Apr-2013 12:00   3.41    200
 AAE  01-Apr-2013 13:00   3.40   2388
 AAE  01-Apr-2013 14:00   3.37    300
 AAE  01-Apr-2013 15:00   3.35    500
 AXN  01-Apr-2013 09:00  23.96   6643
 AXN  01-Apr-2013 10:00  24.03   1000
 AXW  01-Apr-2013 11:00  46.44   2343

So what I want to do is search df2 for all instances of a match from both columns of df1 (noting the different data formats) and insert the data from df2 into df1. Like this (df1)

所以我想要做的是从 df1 的两列中搜索 df2 的所有匹配实例(注意不同的数据格式),并将 df2 中的数据插入 df1。像这样(df1)

         Date_Time  Chan  Sens1  Sens2 
 0   20130401 9:00   AAT  28.82    300
 1  20130401 10:00   AAT  28.35   4900
 2  20130401 11:00   AAT    NaN    NaN
 3  20130401 12:00   AAT  28.04    250
 4  20130401 13:00   AAT    NaN    NaN
 5  20130401 14:00   AAT    NaN    NaN
 6  20130401 15:00   AAT    NaN    NaN

Could you give me some suggestions for the python/pandas code to match this psuedocode:

你能给我一些关于 python/pandas 代码的建议来匹配这个伪代码:

if (df1['date_time'] = df2['date_time']) & (df1['Chan'] = df2['Chan'])): 
    df1['Sens1'] = df2['Sens1']
    df1['Sens2'] = df2['Sens2']

If it effects the answer, it is my intention to bfill and ffill the NaNs and then add this DataFrame to a Panel and then repeat with another channel name in place of AAT.

如果它影响答案,我打算填充和填充 NaN,然后​​将此 DataFrame 添加到面板,然后用另一个通道名称代替 AAT 重复。

回答by Andy Hayden

You can use a plain ol' merge to do this. But first, you should do a little cleanup of you DataFrames, to make sure your datetime columns are actually datetimes rather than strings (Note: it may be better to do this when reading as csv or whatever):

您可以使用普通的 ol' 合并来执行此操作。但首先,您应该对 DataFrame 进行一些清理,以确保您的日期时间列实际上是日期时间而不是字符串(注意:在以 csv 或其他形式读取时这样做可能更好):

df1['Date_Time'] = pd.to_datetime(df1['Date_Time'], format='%Y%m%d %H:%M')
df2['date_time'] = pd.to_datetime(df2['date_time'])

Let's also rename the Datetime columns with the same name:

我们还重命名具有相同名称的日期时间列:

df1.rename(columns={'Date_Time': 'Datetime'}, inplace=True)
df2.rename(columns={'date_time': 'Datetime'}, inplace=True)

Now a simple merge will give you what you're after:

现在一个简单的合并会给你你所追求的:

In [11]: df1.merge(df2)
Out[11]: 
             Datetime Chan  Sens1  Sens2
0 2013-04-01 09:00:00  AAT  28.82    300
1 2013-04-01 10:00:00  AAT  28.35   4900
2 2013-04-01 12:00:00  AAT  28.04    250

In [12]: df1.merge(df2, how='left')
Out[12]: 
             Datetime Chan  Sens1  Sens2
0 2013-04-01 09:00:00  AAT  28.82    300
1 2013-04-01 10:00:00  AAT  28.35   4900
2 2013-04-01 11:00:00  AAT    NaN    NaN
3 2013-04-01 12:00:00  AAT  28.04    250
4 2013-04-01 13:00:00  AAT    NaN    NaN
5 2013-04-01 14:00:00  AAT    NaN    NaN
6 2013-04-01 15:00:00  AAT    NaN    NaN