pandas 使用来自其他数据帧的匹配值在数据帧中创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46789098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:39:02  来源:igfitidea点击:

Create new column in dataframe with match values from other dataframe

pythonpandasdataframe

提问by Gonzalo

Have two dataframes, one has few information (df1) and other has all data (df2). What I am trying to create in a new column in df1 that finds the Total2 values and populates the new column accordingly based on the Names. Note that the Names visible in df1 will always find a match in Names of df2. I am wondering if there is some function in Pandas that already does this? My end goal is to create a bar chart.

有两个数据帧,一个有很少的信息(df1),另一个有所有数据(df2)。我试图在 df1 中的一个新列中创建什么,该列查找 Total2 值并根据名称相应地填充新列。请注意,df1 中可见的名称将始终在 df2 的名称中找到匹配项。我想知道 Pandas 中是否有一些功能已经做到了这一点?我的最终目标是创建一个条形图。

alldatapath = "all_data.csv"
filteredpath = "filtered.csv"

import pandas as pd

df1 = pd.read_csv(
    filteredpath,     # file name
    sep=',',                    # column separator
    quotechar='"',              # quoting character
    na_values="NA",                # fill missing values with 0
    usecols=[0,1],     # columns to use
    decimal='.')                # symbol for decimals

df2 = pd.read_csv(
    alldatapath,     # file name
    sep=',',                    # column separator
    quotechar='"',              # quoting character
    na_values="NA",                # fill missing values with 0
    usecols=[0,1],     # columns to use
    decimal='.')                # symbol for decimals

df1 = df1.head(5) #trim to top 5

print(df1)
print(df2)

output (df1):

输出(df1):

         Name  Total
0  Accounting      3
1   Reporting      1
2     Finance      1
3       Audit      1
4    Template      2

output (df2):

输出(df2):

          Name   Total2
0    Reporting    100
1   Accounting    120
2      Finance    400
3        Audit    500
4  Information     50
5     Template   1200
6      KnowHow   2000

Final Output (df1) should be something like:

最终输出 (df1) 应该是这样的:

         Name  Total  Total2(new column)
0  Accounting      3    120
1   Reporting      1    100
2     Finance      1    400
3       Audit      1    500
4    Template      2   1200

回答by jezrael

Need mapby Seriesfirst for new column:

需要map通过Series首先为新列:

df1['Total2'] = df1['Name'].map(df2.set_index('Name')['Total2'])
print (df1)
         Name  Total  Total2
0  Accounting      3     120
1   Reporting      1     100
2     Finance      1     400
3       Audit      1     500
4    Template      2    1200

And then set_indexwith DataFrame.plot.bar:

然后set_indexDataFrame.plot.bar

df1.set_index('Name').plot.bar()