pandas 熊猫数据框列名称：删除特殊字符

Question

提问by Paul Podbielski

Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).

一些小丑制作了一个 Lotus 数据库/小程序，用于跟踪我们公司的工程问题。开玩笑的是，关键信息是用一个特殊字符命名的……一个数字符号（井号、井号、\u0023）。

abbreviated sample:

缩写示例：

KA#         Issue Date      Current Position
27144       1/9/2014        Accounting
27194       12/20/2012      Engineering
32474       4/21/2008       Engineering
32623-HOLD  4/25/2016       Engineering
32745       11/13/2012      SEPE
32812       10/30/2013      Engineering
32817       12/7/2012       Purchasing
32839       1/8/2013        SEPE

I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.

我将此表（4K 行，15 列）输出到一个 csv 文件并在 python3 中作为Pandas数据帧进行处理。

I generate various outputs. If I use something like:

我生成各种输出。如果我使用类似的东西：

df.iloc[:,[0,3,1,8,9,10]]

I get appropriate output and the key column shows up as "KA#". (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)

我得到适当的输出，关键列显示为"KA#". （当我说“关键列”时，我的意思是“最重要的”......不是“索引”。我保留一个序列索引）

Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#"will be any particular column number. I would like to use column names:

不幸的是，人们有时会在我导出到 csv 之间弄乱 Lotus 中的列顺序，所以我不能保证这"KA#"将是任何特定的列号。我想使用列名：

df.loc[:,["KA#","Issue Date","Current Position"]]

But the "KA#"column is filled with NaN's.

但是该"KA#"列充满了 NaN。

Thanks for any help you can offer.

谢谢你的尽心帮助。

Finally, if I try to rename "KA#"to simply "KA":

最后，如果我尝试重命名"KA#"为"KA"：

df['KA#'].name = 'KA'

throws a KeyError and

抛出一个 KeyError 和

df = df.rename(columns={"KA#": "ka"})

is completely ignored. The column shows up as "KA#".

完全被忽略。该列显示为"KA#"。

Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.

任何人都可以想出一种方法来摆脱或处理该符号吗？在这一点上，我什至满足于使用正则表达式。

Answer 1

回答by shivsn

use str.replace:
df.columns=df.columns.str.replace('#','')

使用str.replace：
df.columns=df.columns.str.replace('#','')

You can check this in the documentation.

您可以在文档中查看。

pandas 熊猫数据框列名称：删除特殊字符

提问by Paul Podbielski

回答by shivsn

相关推荐

最近更新

标签

pandas 熊猫数据框列名称：删除特殊字符

提问by Paul Podbielski

回答by shivsn

相关推荐

pandas ValueError: num 必须是 1 <= num <= 2，而不是 3

pandas 从熊猫数据框中删除非工作日行

pandas 熊猫 - 每个点具有不同颜色图例的散点图

pandas 熊猫时间从 UTC 到本地

相关推荐

最近更新

标签