使用 for 循环重命名 Pandas 数据框列

Question

提问by scrollex

I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns. I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).

我不确定这是否是一种愚蠢的处理方式，但我有几个数据框，所有这些都具有相同的列。我需要重命名每个列中的列以反映每个数据框的名称（之后我将执行所有这些的外部合并）。

Let's say the data frames are called df1, df2and df3, and each contains the columns name, date, and count.

比方说，数据帧被称为df1，df2并且df3，每个包含列name，date和count。

I'd like to rename each of the columns in df1into name_df1, date_df1, and count_df1.

我想每一列重命名df1为name_df1，date_df1和count_df1。

I've written a function to rename the columns, thus:

我编写了一个函数来重命名列，因此：

df_list=[df1, df2, df3]

def rename_cols():
    col_name="name"+suffix
    col_count="count"+suffix
    col_date="date"+suffix

for x in df_list:
    if x['name'].tail(1).item() == df1['name'].tail(1).item():
        suffix="_"+"df1"
        rename_cols()
        continue
    elif x['name'].tail(1).item() == df2['name'].tail(1).item():
        suffix="_"+"df2"
        rename_cols()
        continue
    else:
        suffix="_"+"df3"
        rename_cols()

    col_names=[col_name,col_date,col_count]
    x.columns=col_names

Unfortunately, I get the following error: KeyError: 'name'

不幸的是，我收到以下错误： KeyError: 'name'

I'm really struggling to figure out why that's going on. The columns for df1, the first data frame in the df_list, gets renamed. Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?

我真的很难弄清楚为什么会这样。df1 中的第一个数据框的列df_list被重命名。其他一切都保持不变......我是否搞砸了基本语法（可能），或者我对事情应该如何运作存在根本性的误解？

From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?

据我所知，列表中的第一个数据框被迭代了不止一次——但为什么会这样呢？

Answer 1

回答by mgc

I guess you can achieve this with something simplier, like that :

我想你可以用更简单的方法来实现这一点，比如：

df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
    df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]

If your DataFrames have prettier names you can try:

如果您的 DataFrames 有更漂亮的名称，您可以尝试：

df_names=('Home', 'Work', 'Park')
for df_name in df_names:
    df = globals()[df_name]
    df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]

Or you can fetch the name of each variable by looking up into globals()(or locals()) :

或者您可以通过查找globals()(或locals())来获取每个变量的名称：

df_list = [Home, Work, Park]
for df in df_list:
    name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
    df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]

Answer 2

回答by maxymoo

I'll suppose that you have your stored in a dictionary as this is the idiomatic way of storing a series of named objects in Python. The idiomatic pandas way of changing your column names is to use a vectorised string operation on df.columns:

我假设您将自己的内容存储在字典中，因为这是在 Python 中存储一系列命名对象的惯用方式。更改列名的惯用 Pandas 方法是在上使用向量化字符串操作df.columns：

df_dict = {"df1":df1, "df2":df2, "df3":df3}
for name, df in df_dict.items():
   df.columns = df.columns + "_" + name

Another option to consider is adding the suffixes automatically during the merge. When you call mergeyou can specify the suffixes that will be appended to duplicate column names with the suffixesparameter. If you just want to append the names of the dataframes, you can call it like this. :

要考虑的另一个选项是在合并期间自动添加后缀。当您调用时，merge您可以使用参数指定将附加到重复列名称的后缀suffixes。如果您只想附加数据帧的名称，则可以这样调用。：

from itertools import reduce
df_merged = reduce(lambda x,y: ("df_merged", 
                               x[1].merge(y[1], left_index=True, right_index=True, 
                                         suffixes = ("","_"+y[0]))),
                   df_dict.items())[1]

Answer 3

回答by majr

For completeness, since nobody has mentioned df.rename, see Andy Hayden's answer here:

为了完整起见，由于没有人提到过df.rename，请在此处查看安迪·海登 (Andy Hayden) 的回答：

Renaming columns in pandas

重命名Pandas中的列

df.renamecan take a function as an argument, so in this case:

df.rename可以将函数作为参数，因此在这种情况下：

df_dict = {'df1':df1,'df2':df2,'df3':df3}
for name,df in df_dict.items():
    df.rename(lambda x: x+'_'+name, inplace=True)

使用 for 循环重命名 Pandas 数据框列

提问by scrollex

回答by mgc

回答by maxymoo

回答by majr

相关推荐

最近更新

标签

使用 for 循环重命名 Pandas 数据框列

提问by scrollex

回答by mgc

回答by maxymoo

回答by majr

相关推荐

用 Pandas DataFrame 替换 mysql 数据库表中的行

将 pandas.DataFrame 转换为字节

pandas 由于“完美分离错误”而无法运行逻辑回归

将列表中具有零值的多列添加到 Pandas 数据框中

相关推荐

最近更新

标签