从 Pandas DataFrame 的一列中提取 2 个特殊字符之间的子字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44000278/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract sub-string between 2 special characters from one column of Pandas DataFrame
提问by raja
I have a Python Pandas DataFrame like this:
我有一个像这样的 Python Pandas DataFrame:
Name
Jim, Mr. Jones
Sara, Miss. Baker
Leila, Mrs. Jacob
Ramu, Master. Kuttan
I would like to extract only name title from Name column and copy it into a new column named Title. Output DataFrame looks like this:
我只想从 Name 列中提取 name title 并将其复制到名为 Title 的新列中。输出数据帧如下所示:
Name Title
Jim, Mr. Jones Mr
Sara, Miss. Baker Miss
Leila, Mrs. Jacob Mrs
Ramu, Master. Kuttan Master
I am trying to find a solution with regex but failed to find a proper result.
我正在尝试使用正则表达式找到解决方案,但未能找到正确的结果。
采纳答案by MaxU
In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)
In [158]: df
Out[158]:
Name Title
0 Jim, Mr. Jones Mr
1 Sara, Miss. Baker Miss
2 Leila, Mrs. Jacob Mrs
3 Ramu, Master. Kuttan Master
or
或者
In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]
In [164]: df
Out[164]:
Name Title
0 Jim, Mr. Jones Mr
1 Sara, Miss. Baker Miss
2 Leila, Mrs. Jacob Mrs
3 Ramu, Master. Kuttan Master
回答by svdc
Have a look at str.extract.
看看str.extract。
The regexp you are looking for is (?<=, )\w+(?=.)
. In words: take the substring that is preceded by ,
(but do not include), consists of at least one word character, and ends with a .
(but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.
您正在寻找的正则表达式是(?<=, )\w+(?=.)
. in words:取前面有,
(但不包括),至少由一个单词字符组成,并以a .
(但不包括)结尾的子串。以后,请使用在线正则表达式测试器,例如regex101;正则表达式变得相当微不足道。
This is assuming each entry in the Name
column is formatted the same way.
这是假设Name
列中的每个条目的格式都相同。