Python 如何从熊猫数据框中的字符串术语中删除数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41719259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:30:03  来源:igfitidea点击:

How to remove numbers from string terms in a pandas dataframe

pythonstringpandas

提问by Mayank

I have a data frame similar to the one below:

我有一个类似于下面的数据框:

Name    Volume  Value
May21   23      21321
James   12      12311
Adi22   11      4435
Hello   34      32454
Girl90  56      654654

I want the output to be in the format:

我希望输出采用以下格式:

Name    Volume  Value
May     23      21321
James   12      12311
Adi     11      4435
Hello   34      32454
Girl    56      654654

Want to remove all the numbers from the Name column.

想要从 Name 列中删除所有数字。

Closest I have come is doing it at a celllevel with the following code:

最近我来的是使用以下代码在单元格级别执行此操作:

result = ''.join([i for i in df['Name'][1] if not i.isdigit()])

Any idea how to do it in a better way at the series/dataframelevel.

任何想法如何在系列/数据帧级别以更好的方式做到这一点。

回答by Milo

You can apply str.replaceto the Namecolumn in combination with regular expressions:

您可以申请str.replaceName与正则表达式组合列:

import pandas as pd

# Example DataFrame
df = pd.DataFrame.from_dict({'Name'  : ['May21', 'James', 'Adi22', 'Hello', 'Girl90'],
                             'Volume': [23, 12, 11, 34, 56],
                             'Value' : [21321, 12311, 4435, 32454, 654654]})

df['Name'] = df['Name'].str.replace('\d+', '')

print(df)

Output:

输出:

    Name   Value  Volume
0    May   21321      23
1  James   12311      12
2    Adi    4435      11
3  Hello   32454      34
4   Girl  654654      56

In the regular expression \dstands for "any digit" and +stands for "one or more".

在正则表达式中\d代表“任何数字”并+代表“一个或多个”。

Thus, str.replace('\d+', '')means: "Replace all occurring digits in the strings with nothing".

因此,str.replace('\d+', '')意味着:“用空替换字符串中所有出现的数字”。

回答by MYGz

You can do it like so:

你可以这样做:

df.Name = df.Name.str.replace('\d+', '')

To play and explore, check the online Regular expression demo here: https://regex101.com/r/Y6gJny/2

要玩和探索,请在此处查看在线正则表达式演示:https: //regex101.com/r/Y6gJny/2

Whatever is matched by the pattern \d+i.e 1 or more digits, will be replaced by empty string.

模式匹配的任何内容,\d+即 1 个或多个数字,都将被空字符串替换。

回答by Andras Deak

Although the question sounds more general, the example input only contains trailingnumbers. In this case you don't have to use regular expressions, since .rstrip(also available via the .straccessor of Seriesobjects) can do exactly this:

尽管这个问题听起来更笼统,但示例输入仅包含尾随数字。在这种情况下,您不必使用正则表达式,因为.rstrip(也可以通过对象的.str访问器获得Series)可以做到这一点:

import string
df['Name'] = df['Name'].str.rstrip(string.digits)

Similarly, you can use .lstripto strip any digits from the start, or .stripto remove any digits from the start and the end of each string.

同样,您可以使用.lstrip从开头.strip去除任何数字,或从每个字符串的开头和结尾删除任何数字。

回答by Daniil Mashkin

.stris not necessary. You can use pandas dataframe.replaceor series.replacewith regex=Trueargument.

.str没有必要。您可以使用大熊猫dataframe.replaceseries.replaceregex=True争论。

df.replace('\d+', '', regex=True)

if you want to change source dataframe use inplace=True.

如果要更改源数据帧,请使用inplace=True.

df.replace('\d+', '', regex=True, inplace=True)