使用重复的列名重命名 Pandas 数据框中的列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40774787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:29:36  来源:igfitidea点击:

Renaming columns in a Pandas dataframe with duplicate column names?

pythonpandas

提问by Alex Kinman

I have a df X with columns with duplicate names:

我有一个带有重复名称列的 df X:

In [77]: X_R
Out[77]: 
      dollars  dollars
   0   0.7085   0.5000

I want to rename it so that I have:

我想重命名它,以便我拥有:

In [77]: X_R
Out[77]: 
       Retail   Cost
   0   0.7085   0.5000

Using the Pandas rename function does' work:

使用 Pandas 重命名功能确实有效:

X_R.rename(index=str, columns={"dollars": "Retail", "dollars": "Cost"})

Just gives me two columns named Cost.

只给我两列名为 Cost 的列。

How can I rename the columns in this case?

在这种情况下如何重命名列?

采纳答案by Mihkorz

X_R.columns = ['Retail','Cost']

回答by gbtimmon

Here is another dynamic solution that I think is nicer

这是我认为更好的另一个动态解决方案

In [59]: df
Out[59]:
   a  x  x  x  z
0  6  2  7  7  8
1  6  6  3  1  1
2  6  6  7  5  6
3  8  3  6  1  8
4  5  7  5  3  0
In [61]: class renamer():
             def __init__(self):
                  self.d = dict()

              def __call__(self, x):
                  if x not in self.d:
                      self.d[x] = 0
                      return x
                  else:
                      self.d[x] += 1
                      return "%s_%d" % (x, self.d[x])

          df.rename(columns=renamer())
Out[61]:
   a  x  x_1  x_2  z
0  6   2   7   7  8
1  6   6   3   1  1
2  6   6   7   5  6
3  8   3   6   1  8
4  5   7   5   3  0

回答by MaxU

Here is a dynamic solution:

这是一个动态解决方案:

In [59]: df
Out[59]:
   a  x  x  x  z
0  6  2  7  7  8
1  6  6  3  1  1
2  6  6  7  5  6
3  8  3  6  1  8
4  5  7  5  3  0

In [60]: d
Out[60]: {'x': ['x1', 'x2', 'x3']}

In [61]: df.rename(columns=lambda c: d[c].pop(0) if c in d.keys() else c)
Out[61]:
   a  x1  x2  x3  z
0  6   2   7   7  8
1  6   6   3   1  1
2  6   6   7   5  6
3  8   3   6   1  8
4  5   7   5   3  0

回答by Benedictanjw

MaxU's answerhelped me with this same problem. In this answer, I add in a way to find those duplicated column headers.

MaxU 的回答帮助我解决了同样的问题。在这个答案中,我添加了一种方法来查找那些重复的列标题。

First, we make a dictionary of the duplicated column names with values corresponding to the desired new column names. For this, the defaultdictsubclass is required.

首先,我们使用与所需的新列名对应的值制作重复列名的字典。为此,需要defaultdict子类。

import pandas as pd
from collections import defaultdict

renamer = defaultdict()

We iterate over the duplicated column names to create a dictionary with keys being the duplicated column name and values being a list of new column names. I have chosen this list to be original name_0, original name_1, and so on.

我们迭代重复的列名以创建一个字典,其中键是重复的列名,值是新列名的列表。我已选择此列表作为原始名称_0、原始名称_1 等。

for col in df.columns[df.columns.duplicated(keep=False)].tolist():
    if col not in renamer:
        renamer[col] = [col+'_0']
    else:
        renamer[col].append(col+'_'+str(len(renamer[col])))

print(renamer)
defaultdict(None, {'b': ['b_0', 'b_1', 'b_2', 'b_3'], 'c': ['c_0', 'c_1']})

Original dataframe:

原始数据框:

print(df)
        a   b   b   b   b   c   c   d
Item 0  2   1   0   2   8   3   9   5
Item 1  3   2   7   3   5   4   6   2
Item 2  4   3   8   1   5   7   4   4
Item 3  5   5   3   6   0   5   2   5

Rename the duplicated columns by assigning the new names from our renamer defaultdict, leaving the unduplicated columns alone

通过从我们的重命名器 defaultdict 分配新名称来重命名重复的列,单独留下不重复的列

df.rename(columns=lambda c: renamer[c].pop(0) if c in renamer else c)
        a   b_0 b_1 b_2 b_3 c_0 c_1 d
Item 0  2   1   0   2   8   3   9   5
Item 1  3   2   7   3   5   4   6   2
Item 2  4   3   8   1   5   7   4   4
Item 3  5   5   3   6   0   5   2   5

(As a sidenote, a couple of people have questioned why duplicated column names existed in the first place. For myself, I encountered duplicated column names when importing with the xlwingspackage (to deal with password-protected Excel files).

(作为旁注,有几个人首先质疑为什么会存在重复的列名。就我个人而言,我在使用xlwings包(以处理受密码保护的 Excel 文件)导入时遇到了重复的列名。