Python 重命名熊猫中的特定列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19758364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:30:56  来源:igfitidea点击:

Rename specific column(s) in pandas

pythonpandasdataframerename

提问by natsuki_2002

I've got a dataframe called data. How would I rename the only one column header? For example gdpto log(gdp)?

我有一个名为data. 我将如何重命名唯一的一个列标题?例如gdplog(gdp)?

data =
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

采纳答案by EdChum

data.rename(columns={'gdp':'log(gdp)'}, inplace=True)

The renameshow that it accepts a dict as a param for columnsso you just pass a dict with a single entry.

rename它接受一个字典作为一个PARAM演出columns,所以你只是传递一个字典一次入境。

Also see related

另见相关

回答by Nickil Maveli

A much faster implementation would be to use list-comprehensionif you need to rename a single column.

list-comprehension如果您需要重命名单个列,将使用更快的实现。

df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]

If the need arises to rename multiple columns, either use conditional expressions like:

如果需要重命名多个列,请使用条件表达式,例如:

df.columns = ['log(gdp)' if x=='gdp' else 'cap_mod' if x=='cap' else x for x in df.columns]

Or, construct a mapping using a dictionaryand perform the list-comprehensionwith it's getoperation by setting default value as the old name:

或者,使用 a 构建映射dictionary并通过将默认值设置为旧名称来执行list-comprehensionget操作:

col_dict = {'gdp': 'log(gdp)', 'cap': 'cap_mod'}   ## key→old name, value→new name

df.columns = [col_dict.get(x, x) for x in df.columns]

Timings:

时间:

%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)
10000 loops, best of 3: 168 μs per loop

%%timeit
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
10000 loops, best of 3: 58.5 μs per loop

回答by cs95

How do I rename a specific column in pandas?

如何重命名熊猫中的特定列?

From v0.24+, to rename one (or more) columns at a time,

从 v0.24+ 开始,一次重命名一个(或多个)列,

If you need to rename ALL columns at once,

如果您需要一次重命名所有列,

  • DataFrame.set_axis()method with axis=1. Pass a list-like sequence. Options are available for in-place modification as well.
  • DataFrame.set_axis()方法与axis=1. 传递一个类似列表的序列。选项也可用于就地修改。


renamewith axis=1

renameaxis=1

df = pd.DataFrame('x', columns=['y', 'gdp', 'cap'], index=range(5))
df

   y gdp cap
0  x   x   x
1  x   x   x
2  x   x   x
3  x   x   x
4  x   x   x

With 0.21+, you can now specify an axisparameter with rename:

使用 0.21+,您现在可以使用以下命令指定axis参数rename

df.rename({'gdp':'log(gdp)'}, axis=1)
# df.rename({'gdp':'log(gdp)'}, axis='columns')

   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

(Note that renameis not in-place by default, so you will need to assign the result back.)

(请注意,rename默认情况下不是就地,因此您需要将结果分配回。)

This addition has been made to improve consistency with the rest of the API. The new axisargument is analogous to the columnsparameter—they do the same thing.

进行此添加是为了提高与 API 其余部分的一致性。新axis参数类似于columns参数——它们做同样的事情。

df.rename(columns={'gdp': 'log(gdp)'})

   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

renamealso accepts a callback that is called once for each column.

rename还接受为每列调用一次的回调。

df.rename(lambda x: x[0], axis=1)
# df.rename(lambda x: x[0], axis='columns')

   y  g  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For this specific scenario, you would want to use

对于这个特定的场景,你会想要使用

df.rename(lambda x: 'log(gdp)' if x == 'gdp' else x, axis=1)


Index.str.replace

Index.str.replace

Similar to replacemethod of strings in python, pandas Index and Series (object dtype only) define a ("vectorized") str.replacemethod for string and regex-based replacement.

类似于replacepython 中的字符串方法,pandas Index 和 Series(仅限对象数据类型)定义了一个(“矢量化”)str.replace方法用于基于字符串和正则表达式的替换。

df.columns = df.columns.str.replace('gdp', 'log(gdp)')
df

   y log(gdp) cap
0  x        x   x
1  x        x   x
2  x        x   x
3  x        x   x
4  x        x   x

The advantage of this over the other methods is that str.replacesupports regex (enabled by default). See the docs for more information.

与其他方法相比,这种方法的优点是str.replace支持正则表达式(默认启用)。有关更多信息,请参阅文档。



Passing a list to set_axiswith axis=1

将列表传递给set_axiswithaxis=1

Call set_axiswith a list of header(s). The list must be equal in length to the columns/index size. set_axismutates the original DataFrame by default, but you can specify inplace=Falseto return a modified copy.

set_axis使用标头列表调用。列表的长度必须与列/索引大小相等。set_axis默认情况下会改变原始 DataFrame,但您可以指定inplace=False返回修改后的副本。

df.set_axis(['cap', 'log(gdp)', 'y'], axis=1, inplace=False)
# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)

  cap log(gdp)  y
0   x        x  x
1   x        x  x
2   x        x  x
3   x        x  x
4   x        x  x

Note: In future releases, inplacewill default to True.

注意:在以后的版本中,inplace将默认为True.

Method Chaining
Why choose set_axiswhen we already have an efficient way of assigning columns with df.columns = ...? As shown by Ted Petrou in [this answer],(https://stackoverflow.com/a/46912050/4909087) set_axisis useful when trying to chain methods.

方法链当我们已经有一种有效的方法来分配列时,
为什么要选择?正如 Ted Petrou 在 [this answer] 中所示,( https://stackoverflow.com/a/46912050/4909087)在尝试链接方法时很有用。set_axisdf.columns = ...set_axis

Compare

相比

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

Versus

相对

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

The former is more natural and free flowing syntax.

前者是更自然和自由流畅的语法。

回答by thdoan

There are at least five different ways to rename specific columns in pandas, and I have listed them below along with links to the original answers. I also timed these methods and found them to perform about the same (though YMMV depending on your data set and scenario). The test case below is to rename columns AMNZto A2M2N2Z2in a dataframe with columns Ato Zcontaining a million rows.

至少有五种不同的方法可以重命名熊猫中的特定列,我在下面列出了它们以及原始答案的链接。我还对这些方法进行了计时,发现它们的性能大致相同(尽管 YMMV 取决于您的数据集和场景)。下面的试验情况下是列重命名AMNZA2M2N2Z2在一个数据帧的列AZ含有一百万行。

# Import required modules
import numpy as np
import pandas as pd
import timeit

# Create sample data
df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))

# Standard way - https://stackoverflow.com/a/19758398/452587
def method_1():
    df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})

# Lambda function - https://stackoverflow.com/a/16770353/452587
def method_2():
    df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)

# Mapping function - https://stackoverflow.com/a/19758398/452587
def rename_some(x):
    if x=='A' or x=='M' or x=='N' or x=='Z':
        return x + '2'
    return x
def method_3():
    df_renamed = df.rename(columns=rename_some)

# Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
def method_4():
    df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
        np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
    ]})

# Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
def method_5():
    df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))

print('Method 1:', timeit.timeit(method_1, number=10))
print('Method 2:', timeit.timeit(method_2, number=10))
print('Method 3:', timeit.timeit(method_3, number=10))
print('Method 4:', timeit.timeit(method_4, number=10))
print('Method 5:', timeit.timeit(method_5, number=10))

Output:

输出:

Method 1: 3.650640267
Method 2: 3.163998427
Method 3: 2.998530871
Method 4: 2.9918436889999995
Method 5: 3.2436501520000007

Use the method that is most intuitive to you and easiest for you to implement in your application.

使用对您来说最直观且最容易在您的应用程序中实现的方法。