pandas 从熊猫的字符串中删除字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37919479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing characters from a string in pandas
提问by A Rob4
I have a similar question to this one: Pandas DataFrame: remove unwanted parts from strings in a column.
我有一个与此类似的问题:Pandas DataFrame:从列中的字符串中删除不需要的部分。
So I used:
所以我使用了:
temp_dataframe['PPI'] = temp_dataframe['PPI'].map(lambda x: x.lstrip('PPI/'))
Most, of the items start with a 'PPI/' but not all. It seems that when an item without the 'PPI/' suffix encountered this error:
大多数项目以“PPI/”开头,但不是全部。似乎当一个没有'PPI/'后缀的项目遇到这个错误时:
AttributeError: 'float' object has no attribute 'lstrip'
AttributeError: 'float' 对象没有属性 'lstrip'
Am I missing something here?
我在这里错过了什么吗?
回答by shivsn
use replace:
使用替换:
temp_dataframe['PPI'].replace('PPI/','',regex=True,inplace=True)
or string.replace:
temp_dataframe['PPI'].str.replace('PPI/','')
回答by EdChum
use vectorised str.lstrip
:
使用矢量化str.lstrip
:
temp_dataframe['PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')
it looks like you may have missing values so you should mask those out or replace them:
看起来您可能缺少值,因此您应该将它们屏蔽掉或替换它们:
temp_dataframe['PPI'].fillna('', inplace=True)
or
或者
temp_dataframe.loc[temp_dataframe['PPI'].notnull(), 'PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')
maybe a better method is to filter using str.startswith
and use split
and access the string after the prefix you want to remove:
也许更好的方法是在要删除的前缀之后过滤 usingstr.startswith
和 usesplit
并访问字符串:
temp_dataframe.loc[temp_dataframe['PPI'].str.startswith('PPI/'), 'PPI'] = temp_dataframe['PPI'].str.split('PPI/').str[1]
As @JonClements pointed out that lstrip
is removing whitespace rather than removing the prefix which is what you're after.
正如@JonClements 指出的那样,lstrip
删除空格而不是删除您想要的前缀。
update
更新
Another method is to pass a regex pattern that looks for the optionally prefix and extract all characters after the prefix:
另一种方法是传递一个正则表达式模式,该模式查找可选前缀并提取前缀后的所有字符:
temp_dataframe['PPI'].str.extract('(?:PPI/)?(.*)', expand=False)