Python 重命名熊猫中的特定列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19758364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Rename specific column(s) in pandas
提问by natsuki_2002
I've got a dataframe called data
. How would I rename the only one column header? For example gdp
to log(gdp)
?
我有一个名为data
. 我将如何重命名唯一的一个列标题?例如gdp
对log(gdp)
?
data =
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
5 4 8 3
6 8 2 8
7 9 9 10
8 6 6 4
9 10 10 7
采纳答案by EdChum
回答by Nickil Maveli
A much faster implementation would be to use list-comprehension
if you need to rename a single column.
list-comprehension
如果您需要重命名单个列,将使用更快的实现。
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
If the need arises to rename multiple columns, either use conditional expressions like:
如果需要重命名多个列,请使用条件表达式,例如:
df.columns = ['log(gdp)' if x=='gdp' else 'cap_mod' if x=='cap' else x for x in df.columns]
Or, construct a mapping using a dictionary
and perform the list-comprehension
with it's get
operation by setting default value as the old name:
或者,使用 a 构建映射dictionary
并通过将默认值设置为旧名称来执行list-comprehension
其get
操作:
col_dict = {'gdp': 'log(gdp)', 'cap': 'cap_mod'} ## key→old name, value→new name
df.columns = [col_dict.get(x, x) for x in df.columns]
Timings:
时间:
%%timeit
df.rename(columns={'gdp':'log(gdp)'}, inplace=True)
10000 loops, best of 3: 168 μs per loop
%%timeit
df.columns = ['log(gdp)' if x=='gdp' else x for x in df.columns]
10000 loops, best of 3: 58.5 μs per loop
回答by cs95
How do I rename a specific column in pandas?
如何重命名熊猫中的特定列?
From v0.24+, to rename one (or more) columns at a time,
从 v0.24+ 开始,一次重命名一个(或多个)列,
DataFrame.rename()
withaxis=1
oraxis='columns'
(theaxis
argument was introduced inv0.21
.Index.str.replace()
for string/regex based replacement.
DataFrame.rename()
withaxis=1
oraxis='columns'
(axis
参数是在v0.21
.Index.str.replace()
用于基于字符串/正则表达式的替换。
If you need to rename ALL columns at once,
如果您需要一次重命名所有列,
DataFrame.set_axis()
method withaxis=1
. Pass a list-like sequence. Options are available for in-place modification as well.
DataFrame.set_axis()
方法与axis=1
. 传递一个类似列表的序列。选项也可用于就地修改。
rename
with axis=1
rename
和 axis=1
df = pd.DataFrame('x', columns=['y', 'gdp', 'cap'], index=range(5))
df
y gdp cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
With 0.21+, you can now specify an axis
parameter with rename
:
使用 0.21+,您现在可以使用以下命令指定axis
参数rename
:
df.rename({'gdp':'log(gdp)'}, axis=1)
# df.rename({'gdp':'log(gdp)'}, axis='columns')
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
(Note that rename
is not in-place by default, so you will need to assign the result back.)
(请注意,rename
默认情况下不是就地,因此您需要将结果分配回。)
This addition has been made to improve consistency with the rest of the API. The new axis
argument is analogous to the columns
parameter—they do the same thing.
进行此添加是为了提高与 API 其余部分的一致性。新axis
参数类似于columns
参数——它们做同样的事情。
df.rename(columns={'gdp': 'log(gdp)'})
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
rename
also accepts a callback that is called once for each column.
rename
还接受为每列调用一次的回调。
df.rename(lambda x: x[0], axis=1)
# df.rename(lambda x: x[0], axis='columns')
y g c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
For this specific scenario, you would want to use
对于这个特定的场景,你会想要使用
df.rename(lambda x: 'log(gdp)' if x == 'gdp' else x, axis=1)
Index.str.replace
Index.str.replace
Similar to replace
method of strings in python, pandas Index and Series (object dtype only) define a ("vectorized") str.replace
method for string and regex-based replacement.
类似于replace
python 中的字符串方法,pandas Index 和 Series(仅限对象数据类型)定义了一个(“矢量化”)str.replace
方法用于基于字符串和正则表达式的替换。
df.columns = df.columns.str.replace('gdp', 'log(gdp)')
df
y log(gdp) cap
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
The advantage of this over the other methods is that str.replace
supports regex (enabled by default). See the docs for more information.
与其他方法相比,这种方法的优点是str.replace
支持正则表达式(默认启用)。有关更多信息,请参阅文档。
Passing a list to set_axis
with axis=1
将列表传递给set_axis
withaxis=1
Call set_axis
with a list of header(s). The list must be equal in length to the columns/index size. set_axis
mutates the original DataFrame by default, but you can specify inplace=False
to return a modified copy.
set_axis
使用标头列表调用。列表的长度必须与列/索引大小相等。set_axis
默认情况下会改变原始 DataFrame,但您可以指定inplace=False
返回修改后的副本。
df.set_axis(['cap', 'log(gdp)', 'y'], axis=1, inplace=False)
# df.set_axis(['cap', 'log(gdp)', 'y'], axis='columns', inplace=False)
cap log(gdp) y
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
Note: In future releases, inplace
will default to True
.
注意:在以后的版本中,inplace
将默认为True
.
Method Chaining
Why choose set_axis
when we already have an efficient way of assigning columns with df.columns = ...
? As shown by Ted Petrou in [this answer],(https://stackoverflow.com/a/46912050/4909087) set_axis
is useful when trying to chain methods.
方法链当我们已经有一种有效的方法来分配列时,
为什么要选择?正如 Ted Petrou 在 [this answer] 中所示,( https://stackoverflow.com/a/46912050/4909087)在尝试链接方法时很有用。set_axis
df.columns = ...
set_axis
Compare
相比
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
Versus
相对
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
The former is more natural and free flowing syntax.
前者是更自然和自由流畅的语法。
回答by thdoan
There are at least five different ways to rename specific columns in pandas, and I have listed them below along with links to the original answers. I also timed these methods and found them to perform about the same (though YMMV depending on your data set and scenario). The test case below is to rename columns A
M
N
Z
to A2
M2
N2
Z2
in a dataframe with columns A
to Z
containing a million rows.
至少有五种不同的方法可以重命名熊猫中的特定列,我在下面列出了它们以及原始答案的链接。我还对这些方法进行了计时,发现它们的性能大致相同(尽管 YMMV 取决于您的数据集和场景)。下面的试验情况下是列重命名A
M
N
Z
以A2
M2
N2
Z2
在一个数据帧的列A
到Z
含有一百万行。
# Import required modules
import numpy as np
import pandas as pd
import timeit
# Create sample data
df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
# Standard way - https://stackoverflow.com/a/19758398/452587
def method_1():
df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})
# Lambda function - https://stackoverflow.com/a/16770353/452587
def method_2():
df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)
# Mapping function - https://stackoverflow.com/a/19758398/452587
def rename_some(x):
if x=='A' or x=='M' or x=='N' or x=='Z':
return x + '2'
return x
def method_3():
df_renamed = df.rename(columns=rename_some)
# Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
def method_4():
df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
]})
# Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
def method_5():
df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))
print('Method 1:', timeit.timeit(method_1, number=10))
print('Method 2:', timeit.timeit(method_2, number=10))
print('Method 3:', timeit.timeit(method_3, number=10))
print('Method 4:', timeit.timeit(method_4, number=10))
print('Method 5:', timeit.timeit(method_5, number=10))
Output:
输出:
Method 1: 3.650640267
Method 2: 3.163998427
Method 3: 2.998530871
Method 4: 2.9918436889999995
Method 5: 3.2436501520000007
Use the method that is most intuitive to you and easiest for you to implement in your application.
使用对您来说最直观且最容易在您的应用程序中实现的方法。