pandas 熊猫数据框列名称:删除特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37952797/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:26:15  来源:igfitidea点击:

pandas dataframe column name: remove special character

python-3.xpandasspecial-characters

提问by Paul Podbielski

Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).

一些小丑制作了一个 Lotus 数据库/小程序,用于跟踪我们公司的工程问题。开玩笑的是,关键信息是用一个特殊字符命名的……一个数字符号(井号、井号、\u0023)。

abbreviated sample:

缩写示例:

KA#         Issue Date      Current Position
27144       1/9/2014        Accounting
27194       12/20/2012      Engineering
32474       4/21/2008       Engineering
32623-HOLD  4/25/2016       Engineering
32745       11/13/2012      SEPE
32812       10/30/2013      Engineering
32817       12/7/2012       Purchasing
32839       1/8/2013        SEPE

I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.

我将此表(4K 行,15 列)输出到一个 csv 文件并在 python3 中作为Pandas数据帧进行处理。

I generate various outputs. If I use something like:

我生成各种输出。如果我使用类似的东西:

df.iloc[:,[0,3,1,8,9,10]]

I get appropriate output and the key column shows up as "KA#". (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)

我得到适当的输出,关键列显示为"KA#". (当我说“关键列”时,我的意思是“最重要的”......不是“索引”。我保留一个序列索引)

Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#"will be any particular column number. I would like to use column names:

不幸的是,人们有时会在我导出到 csv 之间弄乱 Lotus 中的列顺序,所以我不能保证这"KA#"将是任何特定的列号。我想使用列名:

df.loc[:,["KA#","Issue Date","Current Position"]]

But the "KA#"column is filled with NaN's.

但是该"KA#"列充满了 NaN。

Thanks for any help you can offer.

谢谢你的尽心帮助。

Finally, if I try to rename "KA#"to simply "KA":

最后,如果我尝试重命名"KA#""KA"

df['KA#'].name = 'KA'

throws a KeyError and

抛出一个 KeyError 和

df = df.rename(columns={"KA#": "ka"})

is completely ignored. The column shows up as "KA#".

完全被忽略。该列显示为"KA#"

Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.

任何人都可以想出一种方法来摆脱或处理该符号吗?在这一点上,我什至满足于使用正则表达式。

回答by shivsn

use str.replace:
df.columns=df.columns.str.replace('#','')

使用str.replace
df.columns=df.columns.str.replace('#','')

You can check this in the documentation.

您可以在文档中查看。