Python 从 Pandas 中的字符串中提取 int

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35376387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:22:30  来源:igfitidea点击:

Extract int from string in Pandas

pythonpandasdataframe

提问by user5739619

Lets say I have a dataframe dfas

比方说,我有一个数据帧df作为

A B
1 V2
3 W42
1 S03
2 T02
3 U71

I want to have a new column (either at it the end of dfor replace column Bwith it, as it doesn't matter) that only extracts the int from the column B. That is I want column Cto look like

我想要一个新列(在它的末尾dfB用它替换列,因为它无关紧要)只从列中提取 int B。那是我希望列C看起来像

C
2
42
3
2
71

So if there is a 0 in front of the number, such as for 03, then I want to return 3 not 03

所以如果数字前面有0,比如03,那么我要返回3而不是03

How can I do this?

我怎样才能做到这一点?

采纳答案by Lokesh A. R.

You can convert to string and extract the integer using regular expressions.

您可以转换为字符串并使用正则表达式提取整数。

df['B'].str.extract('(\d+)').astype(int)

回答by Mike Graham

Assuming there is always exactly one leading letter

假设总是只有一个前导字母

df['B'] = df['B'].str[1:].astype(int)

回答by boesjes

I wrote a little loop to do this , as I didn't have my strings in a DataFrame, but in a list. This way, you can also add a little if statement to account for floats :

我写了一个小循环来做到这一点,因为我的字符串不在 DataFrame 中,而是在列表中。这样,您还可以添加一点 if 语句来说明浮动:

output= ''
input = 'whatever.007'  

for letter in input :
        try :
            int(letter)
            output += letter

        except ValueError :
                pass

        if letter == '.' :
            output += letter

output = float(output)

输出 = 浮点数(输出)

or you can int(output) if you like.

或者你可以 int(output) 如果你喜欢。

回答by Kohn1001

Preparing the DF to have the same one as yours:

准备与您相同的 DF:

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

Now Manipulate it to get your desired outcome:

现在操纵它以获得您想要的结果:

df['C'] = df['B'].apply(lambda x: re.search(r'\d+', x).group())

df.head()


    A   B   C
0   1   V2  2
1   3   W42 42
2   1   S03 03
3   2   T02 02
4   3   U71 71