使用 for 循环重命名 Pandas 数据框列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34843786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Renaming pandas data frame columns using a for loop
提问by scrollex
I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns. I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).
我不确定这是否是一种愚蠢的处理方式,但我有几个数据框,所有这些都具有相同的列。我需要重命名每个列中的列以反映每个数据框的名称(之后我将执行所有这些的外部合并)。
Let's say the data frames are called df1
, df2
and df3
, and each contains the columns name
, date
, and count
.
比方说,数据帧被称为df1
,df2
并且df3
,每个包含列name
,date
和count
。
I'd like to rename each of the columns in df1
into name_df1
, date_df1
, and count_df1
.
我想每一列重命名df1
为name_df1
,date_df1
和count_df1
。
I've written a function to rename the columns, thus:
我编写了一个函数来重命名列,因此:
df_list=[df1, df2, df3]
def rename_cols():
col_name="name"+suffix
col_count="count"+suffix
col_date="date"+suffix
for x in df_list:
if x['name'].tail(1).item() == df1['name'].tail(1).item():
suffix="_"+"df1"
rename_cols()
continue
elif x['name'].tail(1).item() == df2['name'].tail(1).item():
suffix="_"+"df2"
rename_cols()
continue
else:
suffix="_"+"df3"
rename_cols()
col_names=[col_name,col_date,col_count]
x.columns=col_names
Unfortunately, I get the following error: KeyError: 'name'
不幸的是,我收到以下错误: KeyError: 'name'
I'm really struggling to figure out why that's going on. The columns for df1, the first data frame in the df_list
, gets renamed. Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?
我真的很难弄清楚为什么会这样。df1 中的第一个数据框的列df_list
被重命名。其他一切都保持不变......我是否搞砸了基本语法(可能),或者我对事情应该如何运作存在根本性的误解?
From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?
据我所知,列表中的第一个数据框被迭代了不止一次——但为什么会这样呢?
回答by mgc
I guess you can achieve this with something simplier, like that :
我想你可以用更简单的方法来实现这一点,比如:
df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]
If your DataFrames have prettier names you can try:
如果您的 DataFrames 有更漂亮的名称,您可以尝试:
df_names=('Home', 'Work', 'Park')
for df_name in df_names:
df = globals()[df_name]
df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]
Or you can fetch the name of each variable by looking up into globals()
(or locals()
) :
或者您可以通过查找globals()
(或locals()
)来获取每个变量的名称:
df_list = [Home, Work, Park]
for df in df_list:
name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]
回答by maxymoo
I'll suppose that you have your stored in a dictionary as this is the idiomatic way of storing a series of named objects in Python. The idiomatic pandas way of changing your column names is to use a vectorised string operation on df.columns
:
我假设您将自己的内容存储在字典中,因为这是在 Python 中存储一系列命名对象的惯用方式。更改列名的惯用 Pandas 方法是在 上使用向量化字符串操作df.columns
:
df_dict = {"df1":df1, "df2":df2, "df3":df3}
for name, df in df_dict.items():
df.columns = df.columns + "_" + name
Another option to consider is adding the suffixes automatically during the merge. When you call merge
you can specify the suffixes that will be appended to duplicate column names with the suffixes
parameter. If you just want to append the names of the dataframes, you can call it like this. :
要考虑的另一个选项是在合并期间自动添加后缀。当您调用时,merge
您可以使用参数指定将附加到重复列名称的后缀suffixes
。如果您只想附加数据帧的名称,则可以这样调用。:
from itertools import reduce
df_merged = reduce(lambda x,y: ("df_merged",
x[1].merge(y[1], left_index=True, right_index=True,
suffixes = ("","_"+y[0]))),
df_dict.items())[1]
回答by majr
For completeness, since nobody has mentioned df.rename
, see Andy Hayden's answer here:
为了完整起见,由于没有人提到过df.rename
,请在此处查看安迪·海登 (Andy Hayden) 的回答:
df.rename
can take a function as an argument, so in this case:
df.rename
可以将函数作为参数,因此在这种情况下:
df_dict = {'df1':df1,'df2':df2,'df3':df3}
for name,df in df_dict.items():
df.rename(lambda x: x+'_'+name, inplace=True)