pandas 熊猫数据框列名称:删除特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37952797/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe column name: remove special character
提问by Paul Podbielski
Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).
一些小丑制作了一个 Lotus 数据库/小程序,用于跟踪我们公司的工程问题。开玩笑的是,关键信息是用一个特殊字符命名的……一个数字符号(井号、井号、\u0023)。
abbreviated sample:
缩写示例:
KA# Issue Date Current Position
27144 1/9/2014 Accounting
27194 12/20/2012 Engineering
32474 4/21/2008 Engineering
32623-HOLD 4/25/2016 Engineering
32745 11/13/2012 SEPE
32812 10/30/2013 Engineering
32817 12/7/2012 Purchasing
32839 1/8/2013 SEPE
I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.
我将此表(4K 行,15 列)输出到一个 csv 文件并在 python3 中作为Pandas数据帧进行处理。
I generate various outputs. If I use something like:
我生成各种输出。如果我使用类似的东西:
df.iloc[:,[0,3,1,8,9,10]]
I get appropriate output and the key column shows up as "KA#"
. (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)
我得到适当的输出,关键列显示为"KA#"
. (当我说“关键列”时,我的意思是“最重要的”......不是“索引”。我保留一个序列索引)
Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#"
will be any particular column number. I would like to use column names:
不幸的是,人们有时会在我导出到 csv 之间弄乱 Lotus 中的列顺序,所以我不能保证这"KA#"
将是任何特定的列号。我想使用列名:
df.loc[:,["KA#","Issue Date","Current Position"]]
But the "KA#"
column is filled with NaN's.
但是该"KA#"
列充满了 NaN。
Thanks for any help you can offer.
谢谢你的尽心帮助。
Finally, if I try to rename "KA#"
to simply "KA"
:
最后,如果我尝试重命名"KA#"
为"KA"
:
df['KA#'].name = 'KA'
throws a KeyError and
抛出一个 KeyError 和
df = df.rename(columns={"KA#": "ka"})
is completely ignored. The column shows up as "KA#"
.
完全被忽略。该列显示为"KA#"
。
Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.
任何人都可以想出一种方法来摆脱或处理该符号吗?在这一点上,我什至满足于使用正则表达式。
回答by shivsn
use str.replace:df.columns=df.columns.str.replace('#','')
使用str.replace:df.columns=df.columns.str.replace('#','')
You can check this in the documentation.
您可以在文档中查看。