使用 for 循环重命名 Pandas 数据框列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34843786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:31:23  来源:igfitidea点击:

Renaming pandas data frame columns using a for loop

pythonpandas

提问by scrollex

I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns. I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).

我不确定这是否是一种愚蠢的处理方式,但我有几个数据框,所有这些都具有相同的列。我需要重命名每个列中的列以反映每个数据框的名称(之后我将执行所有这些的外部合并)。

Let's say the data frames are called df1, df2and df3, and each contains the columns name, date, and count.

比方说,数据帧被称为df1df2并且df3,每个包含列namedatecount

I'd like to rename each of the columns in df1into name_df1, date_df1, and count_df1.

我想每一列重命名df1name_df1date_df1count_df1

I've written a function to rename the columns, thus:

我编写了一个函数来重命名列,因此:

df_list=[df1, df2, df3]

def rename_cols():
    col_name="name"+suffix
    col_count="count"+suffix
    col_date="date"+suffix

for x in df_list:
    if x['name'].tail(1).item() == df1['name'].tail(1).item():
        suffix="_"+"df1"
        rename_cols()
        continue
    elif x['name'].tail(1).item() == df2['name'].tail(1).item():
        suffix="_"+"df2"
        rename_cols()
        continue
    else:
        suffix="_"+"df3"
        rename_cols()

    col_names=[col_name,col_date,col_count]
    x.columns=col_names

Unfortunately, I get the following error: KeyError: 'name'

不幸的是,我收到以下错误: KeyError: 'name'

I'm really struggling to figure out why that's going on. The columns for df1, the first data frame in the df_list, gets renamed. Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?

我真的很难弄清楚为什么会这样。df1 中的第一个数据框的列df_list被重命名。其他一切都保持不变......我是否搞砸了基本语法(可能),或者我对事情应该如何运作存在根本性的误解?

From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?

据我所知,列表中的第一个数据框被迭代了不止一次——但为什么会这样呢?

回答by mgc

I guess you can achieve this with something simplier, like that :

我想你可以用更简单的方法来实现这一点,比如:

df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
    df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]

If your DataFrames have prettier names you can try:

如果您的 DataFrames 有更漂亮的名称,您可以尝试:

df_names=('Home', 'Work', 'Park')
for df_name in df_names:
    df = globals()[df_name]
    df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]

Or you can fetch the name of each variable by looking up into globals()(or locals()) :

或者您可以通过查找globals()(或locals())来获取每个变量的名称:

df_list = [Home, Work, Park]
for df in df_list:
    name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
    df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]

回答by maxymoo

I'll suppose that you have your stored in a dictionary as this is the idiomatic way of storing a series of named objects in Python. The idiomatic pandas way of changing your column names is to use a vectorised string operation on df.columns:

我假设您将自己的内容存储在字典中,因为这是在 Python 中存储一系列命名对象的惯用方式。更改列名的惯用 Pandas 方法是在 上使用向量化字符串操作df.columns

df_dict = {"df1":df1, "df2":df2, "df3":df3}
for name, df in df_dict.items():
   df.columns = df.columns + "_" + name

Another option to consider is adding the suffixes automatically during the merge. When you call mergeyou can specify the suffixes that will be appended to duplicate column names with the suffixesparameter. If you just want to append the names of the dataframes, you can call it like this. :

要考虑的另一个选项是在合并期间自动添加后缀。当您调用时,merge您可以使用参数指定将附加到重复列名称的后缀suffixes。如果您只想附加数据帧的名称,则可以这样调用。:

from itertools import reduce
df_merged = reduce(lambda x,y: ("df_merged", 
                               x[1].merge(y[1], left_index=True, right_index=True, 
                                         suffixes = ("","_"+y[0]))),
                   df_dict.items())[1]

回答by majr

For completeness, since nobody has mentioned df.rename, see Andy Hayden's answer here:

为了完整起见,由于没有人提到过df.rename,请在此处查看安迪·海登 (Andy Hayden) 的回答:

Renaming columns in pandas

重命名Pandas中的列

df.renamecan take a function as an argument, so in this case:

df.rename可以将函数作为参数,因此在这种情况下:

df_dict = {'df1':df1,'df2':df2,'df3':df3}
for name,df in df_dict.items():
    df.rename(lambda x: x+'_'+name, inplace=True)