Python 过滤时从熊猫数据框中获取子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30780742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:58:19  来源:igfitidea点击:

Get substring from pandas dataframe while filtering

pythonpandas

提问by Eduardo

Say I have a dataframe with the following information:

假设我有一个包含以下信息的数据框:

Name    Points          String
John        24     FTS8500001A
Richard     35     FTS6700001B
John        29     FTS2500001A
Richard     35     FTS3800001B
John        34     FTS4500001A

Here is the way to get a DataFrame with the sample above:

以下是使用上述示例获取 DataFrame 的方法:

import pandas as pd
keys = ('Name', 'Points', 'String')
names = pd.Series(('John', 'Richard', 'John', 'Richard', 'John'))
ages = pd.Series((24,35,29,35,34))
strings = pd.Series(('FTS8500001A','FTS6700001B','FTS2500001A','FTS3800001B','FTS4500001A'))
df = pd.concat((names, ages, strings), axis=1, keys=keys)

I want to select every row that meet the following criteria: Name=Richard And Points=35. And for such rows I want to read the 4th and 5th char of the String column (the two numbers just after FTS).

我想选择满足以下条件的每一行:Name=Richard And Points=35。对于这样的行,我想读取 String 列的第 4 个和第 5 个字符(FTS 之后的两个数字)。

The output I want is the numbers 67 and 38.

我想要的输出是数字 67 和 38。

I've tried several ways to achieve it but with zero results. Can you please help?

我尝试了几种方法来实现它,但结果为零。你能帮忙吗?

Thank you very much.
Eduardo

非常感谢。
爱德华多

采纳答案by EdChum

Use a boolean mask to filter your df and then call strand slice the string:

使用布尔掩码过滤您的 df,然后调用str并切片字符串:

In [77]:
df.loc[(df['Name'] == 'Richard') & (df['Points']==35),'String'].str[3:5]

Out[77]:
1    67
3    38
Name: String, dtype: object

回答by firelynx

Pandas string methods

熊猫字符串方法

You can mask it on your criteria and then use pandas string methods

您可以根据您的条件屏蔽它,然后使用熊猫字符串方法

mask_richard = df.Name == 'Richard'
mask_points = df.Points == 35
df[mask_richard & mask_points].String.str[3:5]

1    67
3    38