pandas 从熊猫的字符串中删除字符

Question

提问by A Rob4

I have a similar question to this one: Pandas DataFrame: remove unwanted parts from strings in a column.

我有一个与此类似的问题：Pandas DataFrame：从列中的字符串中删除不需要的部分。

So I used:

所以我使用了：

temp_dataframe['PPI'] = temp_dataframe['PPI'].map(lambda x: x.lstrip('PPI/'))

Most, of the items start with a 'PPI/' but not all. It seems that when an item without the 'PPI/' suffix encountered this error:

大多数项目以“PPI/”开头，但不是全部。似乎当一个没有'PPI/'后缀的项目遇到这个错误时：

AttributeError: 'float' object has no attribute 'lstrip'

AttributeError: 'float' 对象没有属性 'lstrip'

Am I missing something here?

我在这里错过了什么吗？

Answer 1

回答by shivsn

use replace:

使用替换：

temp_dataframe['PPI'].replace('PPI/','',regex=True,inplace=True)

or string.replace:

或string.replace：

temp_dataframe['PPI'].str.replace('PPI/','')

Answer 2

回答by EdChum

use vectorised str.lstrip:

使用矢量化str.lstrip：

temp_dataframe['PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

it looks like you may have missing values so you should mask those out or replace them:

看起来您可能缺少值，因此您应该将它们屏蔽掉或替换它们：

temp_dataframe['PPI'].fillna('', inplace=True)

or

或者

temp_dataframe.loc[temp_dataframe['PPI'].notnull(), 'PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

maybe a better method is to filter using str.startswithand use splitand access the string after the prefix you want to remove:

也许更好的方法是在要删除的前缀之后过滤 usingstr.startswith和 usesplit并访问字符串：

temp_dataframe.loc[temp_dataframe['PPI'].str.startswith('PPI/'), 'PPI'] = temp_dataframe['PPI'].str.split('PPI/').str[1]

As @JonClements pointed out that lstripis removing whitespace rather than removing the prefix which is what you're after.

正如@JonClements 指出的那样，lstrip删除空格而不是删除您想要的前缀。

update

更新

Another method is to pass a regex pattern that looks for the optionally prefix and extract all characters after the prefix:

另一种方法是传递一个正则表达式模式，该模式查找可选前缀并提取前缀后的所有字符：

temp_dataframe['PPI'].str.extract('(?:PPI/)?(.*)', expand=False)

pandas 从熊猫的字符串中删除字符

提问by A Rob4

回答by shivsn

回答by EdChum

相关推荐

最近更新

标签

pandas 从熊猫的字符串中删除字符

提问by A Rob4

回答by shivsn

回答by EdChum

相关推荐

pandas df.loc[z,x]=y 如何提高速度？

pandas “ValueError: 标签 ['timestamp'] 未包含在轴中”错误

pandas 熊猫 to_datetime 解析错误的年份

尝试将函数应用于重复列时，Pandas 抛出奇怪的异常

相关推荐

最近更新

标签