Python 如何从熊猫数据框中的字符串术语中删除数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41719259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove numbers from string terms in a pandas dataframe
提问by Mayank
I have a data frame similar to the one below:
我有一个类似于下面的数据框:
Name Volume Value
May21 23 21321
James 12 12311
Adi22 11 4435
Hello 34 32454
Girl90 56 654654
I want the output to be in the format:
我希望输出采用以下格式:
Name Volume Value
May 23 21321
James 12 12311
Adi 11 4435
Hello 34 32454
Girl 56 654654
Want to remove all the numbers from the Name column.
想要从 Name 列中删除所有数字。
Closest I have come is doing it at a celllevel with the following code:
最近我来的是使用以下代码在单元格级别执行此操作:
result = ''.join([i for i in df['Name'][1] if not i.isdigit()])
Any idea how to do it in a better way at the series/dataframelevel.
任何想法如何在系列/数据帧级别以更好的方式做到这一点。
回答by Milo
You can apply str.replaceto the Name
column in combination with regular expressions:
您可以申请str.replace到Name
与正则表达式组合列:
import pandas as pd
# Example DataFrame
df = pd.DataFrame.from_dict({'Name' : ['May21', 'James', 'Adi22', 'Hello', 'Girl90'],
'Volume': [23, 12, 11, 34, 56],
'Value' : [21321, 12311, 4435, 32454, 654654]})
df['Name'] = df['Name'].str.replace('\d+', '')
print(df)
Output:
输出:
Name Value Volume
0 May 21321 23
1 James 12311 12
2 Adi 4435 11
3 Hello 32454 34
4 Girl 654654 56
In the regular expression \d
stands for "any digit" and +
stands for "one or more".
在正则表达式中\d
代表“任何数字”并+
代表“一个或多个”。
Thus, str.replace('\d+', '')
means: "Replace all occurring digits in the strings with nothing".
因此,str.replace('\d+', '')
意味着:“用空替换字符串中所有出现的数字”。
回答by MYGz
You can do it like so:
你可以这样做:
df.Name = df.Name.str.replace('\d+', '')
To play and explore, check the online Regular expression demo here: https://regex101.com/r/Y6gJny/2
要玩和探索,请在此处查看在线正则表达式演示:https: //regex101.com/r/Y6gJny/2
Whatever is matched by the pattern \d+
i.e 1 or more digits, will be replaced by empty string.
模式匹配的任何内容,\d+
即 1 个或多个数字,都将被空字符串替换。
回答by Andras Deak
Although the question sounds more general, the example input only contains trailingnumbers. In this case you don't have to use regular expressions, since .rstrip
(also available via the .str
accessor of Series
objects) can do exactly this:
尽管这个问题听起来更笼统,但示例输入仅包含尾随数字。在这种情况下,您不必使用正则表达式,因为.rstrip
(也可以通过对象的.str
访问器获得Series
)可以做到这一点:
import string
df['Name'] = df['Name'].str.rstrip(string.digits)
Similarly, you can use .lstrip
to strip any digits from the start, or .strip
to remove any digits from the start and the end of each string.
回答by Daniil Mashkin
.str
is not necessary. You can use pandas dataframe.replaceor series.replacewith regex=True
argument.
.str
没有必要。您可以使用大熊猫dataframe.replace或series.replace与regex=True
争论。
df.replace('\d+', '', regex=True)
if you want to change source dataframe use inplace=True
.
如果要更改源数据帧,请使用inplace=True
.
df.replace('\d+', '', regex=True, inplace=True)