从 Pandas 的列标题中删除前缀（或后缀）子字符串

Question

提问by user9185511

I'm trying to remove the sub string _x that is located in the end of part of my df column names.

我正在尝试删除位于我的 df 列名称部分末尾的子字符串 _x。

Sample df code:

示例 df 代码：

import pandas as pd

d = {'W_x': ['abcde','abcde','abcde']}
df = pd.DataFrame(data=d)

df['First_x']=[0,0,0]
df['Last_x']=[1,2,3]
df['Slice']=['abFC=0.01#%sdadf','12fdak*4%FC=-0.035faf,dd43','FC=0.5fasff']

output:

输出：

     W_x  First_x Last_x                 Slice
0  abcde      0     1                   abFC=0.01
1  abcde      0     2  12fdak*4%FC=-0.035faf,dd43
2  abcde      0     3                 FC=0.5fasff

Desired output:

期望的输出：

       W  First  Last                       Slice
0  abcde      0     1                   abFC=0.01
1  abcde      0     2  12fdak*4%FC=-0.035faf,dd43
2  abcde      0     3                 FC=0.5fasff

Answer 1

采纳答案by dmontaner

I usually use @cs95 way but wrapping it in a data frame method just for convenience:

我通常使用 @cs95 方式，但为了方便起见，将其包装在数据框方法中：

import pandas as pd

def drop_prefix(self, prefix):
    self.columns = self.columns.str.lstrip(prefix)
    return self

pd.core.frame.DataFrame.drop_prefix = drop_prefix

Then you can use it as with inverse method already implemented in pandas add_prefix:

然后您可以将其与已在 pandas 中实现的逆方法一起使用add_prefix：

pd.drop_prefix('myprefix_')

Answer 2

回答by cs95

Use str.strip/rstrip:

使用str.strip/ rstrip:

# df.columns = df.columns.str.strip('_x')
# Or, 
df.columns = df.columns.str.rstrip('_x')  # strip suffix at the right end only.

df.columns
# Index(['W', 'First', 'Last', 'Slice'], dtype='object')

To avoid the issue highlighted in the comments:

为避免评论中突出显示的问题：

Beware of strip() if any column name starts or ends with either _ or x beyond the suffix.

如果任何列名称以 _ 或 x 超出后缀开头或结尾，请注意 strip() 。

You could use str.replace,

你可以用str.replace，

df.columns = df.columns.str.replace(r'_x$', '')

df.columns
# Index(['W', 'First', 'Last', 'Slice'], dtype='object')

Answer 3

回答by Quang Hoang

df.columns = [col[:-2] for col in df.columns if col[-2:]=='_x' else col]

or

或者

df.columns = [col.replace('_x', '') for col in df.columns]

Answer 4

回答by Quickbeam2k1

I'd suggest to use the renamefunction:

我建议使用该rename功能：

df.rename(columns = lambda x: x.strip('_x'))

Output is as desired

输出符合要求

Of yourse you can also take care of FabienP's comment and modify if according to Quang Hoang's solution:

您还可以根据 Quang Hoang 的解决方案处理 FabienP 的评论并进行修改：

df.rename(columns = lambda x: x.replace('_x$', ''))

gives the desired output.

给出所需的输出。

Another solution is simply:

另一种解决方案很简单：

df.rename(columns = lambda x: x[:-2] if x.endswith('_x') else x)

从 Pandas 的列标题中删除前缀（或后缀）子字符串

提问by user9185511

采纳答案by dmontaner

回答by cs95

回答by Quang Hoang

回答by Quickbeam2k1

相关推荐

最近更新

标签

从 Pandas 的列标题中删除前缀（或后缀）子字符串

提问by user9185511

采纳答案by dmontaner

回答by cs95

回答by Quang Hoang

回答by Quickbeam2k1

相关推荐

pandas 0.24.1 关键错误：“[Index(['A' 'B'], dtype='object')] 均不在 [columns] 中”

具有不同列的 Pandas 连接数据帧：AttributeError: 'NoneType' 对象没有属性 'is_extension'

pandas 在 Python 中循环遍历数据帧的更优雅方式

从 Pandas DataFrame 创建 Spark DataFrame

相关推荐

最近更新

标签