从 Dataframe 中所有列的列名中删除最后两个字符 - Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37061541/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:11:05  来源:igfitidea点击:

Remove last two characters from column names of all the columns in Dataframe - Pandas

pythonstringpandasdataframe

提问by Observer

I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created. The following is the command I used,

我正在使用用户 ID 键加入具有相同列/列名称的两个数据框 (a,b),在加入时,我必须提供后缀字符,以便创建它。以下是我使用的命令,

a.join(b,how='inner', on='userId',lsuffix="_1")

If I dont use this suffix, I am getting error. But I dont want the column names to change because, that is causing a problem while running other analysis. So I want to remove this "_1" character from all the column names of the resulting dataframe. Can anybody suggest me an efficient way to remove last two characters of names of all the columns in the Pandas dataframe?

如果我不使用此后缀,则会出错。但我不希望更改列名,因为这会在运行其他分析时导致问题。所以我想从结果数据框的所有列名中删除这个“_1”字符。任何人都可以建议我一种有效的方法来删除 Pandas 数据框中所有列的名称的最后两个字符吗?

Thanks

谢谢

回答by Thtu

This snippet should get the job done :

这个片段应该可以完成工作:

df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))

Edit : This is a better way to do it

编辑:这是一个更好的方法

df.rename(columns = lambda x : str(x)[:-2])

In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.

在这两种情况下,我们所做的只是遍历列并应用一些函数。在这种情况下,该函数将某些内容转换为字符串,并将所有内容转换为最后两个字符。

I'm sure there are a few other ways you could do this.

我敢肯定还有其他一些方法可以做到这一点。

回答by aydow

You could use str.rstriplike so

你可以str.rstrip像这样使用

In [214]: import functools as ft

In [215]: f = ft.partial(np.random.choice, *[5, 3])

In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})

In [226]: df
Out[226]:
   a  b  c  a_1  b_1  c_1
0  4  2  0    2    3    2
1  0  0  3    2    1    1
2  4  0  4    4    4    3

In [227]: df.columns = df.columns.str.rstrip('_1')

In [228]: df
Out[228]:
   a  b  c  a  b  c
0  4  2  0  2  3  2
1  0  0  3  2  1  1
2  4  0  4  4  4  3

However if you need something more flexible (albeit probably a bit slower), you can use str.extractwhich, with the power of regexes, will allow you to select which part of the column name you would like to keep

但是,如果您需要更灵活的东西(尽管可能会慢一点),您可以使用str.extract它,借助正则表达式的强大功能,您可以选择要保留的列名的哪一部分

In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})

In [217]: df
Out[217]:
   a_0  b_0  c_0  a_1  b_1  c_1  a_2  b_2  c_2
0    0    1    0    2    2    4    0    0    3
1    0    0    3    1    4    2    4    3    2
2    2    0    1    0    0    2    2    2    1

In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]

In [224]: df
Out[224]:
0  a  b  c  a  b  c  a  b  c
0  1  1  0  0  0  2  1  1  2
1  1  0  1  0  1  2  0  4  1
2  1  3  1  3  4  2  0  1  1

Idea to use df.columns.strcame from thisanswer

使用的想法df.columns.str来自这个答案