pandas ValueError:模式不包含捕获组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/54343378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas ValueError: pattern contains no capture groups
提问by Chan
When using regular expression, I get:
使用正则表达式时,我得到:
import re
string = r'http://www.example.com/abc.html'
result = re.search('^.*com', string).group()
In pandas, I write:
在Pandas中,我写道:
df = pd.DataFrame(columns = ['index', 'url'])
df.loc[len(df), :] = [1, 'http://www.example.com/abc.html']
df.loc[len(df), :] = [2, 'http://www.hello.com/def.html']
df.str.extract('^.*com')
ValueError: pattern contains no capture groups
How to solve the problem?
如何解决问题?
Thanks.
谢谢。
回答by cs95
According to the docs, you need to specify a capture group(i.e., parentheses) for str.extract
to, well, extract.
根据docs,您需要指定一个捕获组(即括号)以str.extract
进行提取。
Series.str.extract(pat, flags=0, expand=True)
For each subject string in the Series, extract groups from the first match of regular expression pat.
Series.str.extract(pat, flags=0, expand=True)
对于系列中的每个主题字符串,从正则表达式 pat 的第一个匹配项中提取组。
Each capture group constitutes its own column in the output.
每个捕获组在输出中构成自己的列。
df.url.str.extract(r'(.*.com)')
0
0 http://www.example.com
1 http://www.hello.com
# If you need named capture groups,
df.url.str.extract(r'(?P<URL>.*.com)')
URL
0 http://www.example.com
1 http://www.hello.com
Or, if you need a Series,
或者,如果您需要一个系列,
df.url.str.extract(r'(.*.com)', expand=False)
0 http://www.example.com
1 http://www.hello.com
Name: url, dtype: object
回答by jezrael
You need specify column url
with ()
for match groups:
你需要指定柱url
与()
用于匹配组:
df['new'] = df['url'].str.extract(r'(^.*com)')
print (df)
index url new
0 1 http://www.example.com/abc.html http://www.example.com
1 2 http://www.hello.com/def.html http://www.hello.com
回答by anky
Try this python library, works well for this purpose:
试试这个 python 库,适用于这个目的:
Using urllib.parse
使用 urllib.parse
from urllib.parse import urlparse
df['domain']=df.url.apply(lambda x:urlparse(x).netloc)
print(df)
index url domain
0 1 http://www.example.com/abc.html www.example.com
1 2 http://www.hello.com/def.html www.hello.com