pandas 从熊猫的字符串中删除字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37919479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:25:17  来源:igfitidea点击:

Removing characters from a string in pandas

pythonpandas

提问by A Rob4

I have a similar question to this one: Pandas DataFrame: remove unwanted parts from strings in a column.

我有一个与此类似的问题:Pandas DataFrame:从列中的字符串中删除不需要的部分

So I used:

所以我使用了:

temp_dataframe['PPI'] = temp_dataframe['PPI'].map(lambda x: x.lstrip('PPI/'))

Most, of the items start with a 'PPI/' but not all. It seems that when an item without the 'PPI/' suffix encountered this error:

大多数项目以“PPI/”开头,但不是全部。似乎当一个没有'PPI/'后缀的项目遇到这个错误时:

AttributeError: 'float' object has no attribute 'lstrip'

AttributeError: 'float' 对象没有属性 'lstrip'

Am I missing something here?

我在这里错过了什么吗?

回答by shivsn

use replace:

使用替换

temp_dataframe['PPI'].replace('PPI/','',regex=True,inplace=True)

or string.replace:

string.replace

temp_dataframe['PPI'].str.replace('PPI/','')

回答by EdChum

use vectorised str.lstrip:

使用矢量化str.lstrip

temp_dataframe['PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

it looks like you may have missing values so you should mask those out or replace them:

看起来您可能缺少值,因此您应该将它们屏蔽掉或替换它们:

temp_dataframe['PPI'].fillna('', inplace=True)

or

或者

temp_dataframe.loc[temp_dataframe['PPI'].notnull(), 'PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

maybe a better method is to filter using str.startswithand use splitand access the string after the prefix you want to remove:

也许更好的方法是在要删除的前缀之后过滤 usingstr.startswith和 usesplit并访问字符串:

temp_dataframe.loc[temp_dataframe['PPI'].str.startswith('PPI/'), 'PPI'] = temp_dataframe['PPI'].str.split('PPI/').str[1]

As @JonClements pointed out that lstripis removing whitespace rather than removing the prefix which is what you're after.

正如@JonClements 指出的那样,lstrip删除空格而不是删除您想要的前缀。

update

更新

Another method is to pass a regex pattern that looks for the optionally prefix and extract all characters after the prefix:

另一种方法是传递一个正则表达式模式,该模式查找可选前缀并提取前缀后的所有字符:

temp_dataframe['PPI'].str.extract('(?:PPI/)?(.*)', expand=False)