从 Pandas DataFrame 的一列中提取 2 个特殊字符之间的子字符串

Question

提问by raja

I have a Python Pandas DataFrame like this:

我有一个像这样的 Python Pandas DataFrame：

Name  
Jim, Mr. Jones
Sara, Miss. Baker
Leila, Mrs. Jacob
Ramu, Master. Kuttan

I would like to extract only name title from Name column and copy it into a new column named Title. Output DataFrame looks like this:

我只想从 Name 列中提取 name title 并将其复制到名为 Title 的新列中。输出数据帧如下所示：

Name                    Title
Jim, Mr. Jones          Mr
Sara, Miss. Baker       Miss
Leila, Mrs. Jacob       Mrs
Ramu, Master. Kuttan    Master

I am trying to find a solution with regex but failed to find a proper result.

我正在尝试使用正则表达式找到解决方案，但未能找到正确的结果。

Answer 1

采纳答案by MaxU

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

or

或者

In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]

In [164]: df
Out[164]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Answer 2

回答by svdc

Have a look at str.extract.

看看str.extract。

The regexp you are looking for is (?<=, )\w+(?=.). In words: take the substring that is preceded by ,(but do not include), consists of at least one word character, and ends with a .(but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.

您正在寻找的正则表达式是(?<=, )\w+(?=.). in words：取前面有,（但不包括），至少由一个单词字符组成，并以a .（但不包括）结尾的子串。以后，请使用在线正则表达式测试器，例如regex101；正则表达式变得相当微不足道。

This is assuming each entry in the Namecolumn is formatted the same way.

这是假设Name列中的每个条目的格式都相同。

从 Pandas DataFrame 的一列中提取 2 个特殊字符之间的子字符串

提问by raja

采纳答案by MaxU

回答by svdc

相关推荐

最近更新

标签

从 Pandas DataFrame 的一列中提取 2 个特殊字符之间的子字符串

提问by raja

采纳答案by MaxU

回答by svdc

相关推荐

pandas 根据多个条件格式化熊猫数据框中单元格的颜色

pandas 如何在pandas数据框中获得等效的numpy数组索引？

pandas 熊猫为每个字符拆分数据框列

如何将具有多个标题行的 csv 文件读入 Pandas？

相关推荐

最近更新

标签