Python 在最后一个正斜杠之前删除部分字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29657384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove Part of String Before the Last Forward Slash
提问by freddiev4
The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.
我目前正在开发的程序从网站检索 URL 并将它们放入列表中。我想得到的是 URL 的最后一部分。
So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"
I would want to remove everything before "interpreter.html"
.
因此,如果我的 URL 列表中的第一个元素是"https://docs.python.org/3.4/tutorial/interpreter.html"
我想删除"interpreter.html"
.
Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.
有没有我可以使用的函数、库或正则表达式来实现这一点?我看过其他 Stack Overflow 帖子,但解决方案似乎不起作用。
These are two of my several attempts:
这是我多次尝试中的两个:
for link in link_list:
file_names.append(link.replace('/[^/]*$',''))
print(file_names)
&
&
for link in link_list:
file_names.append(link.rpartition('//')[-1])
print(file_names)
采纳答案by Bhargav Rao
Have a look at str.rsplit
.
看看str.rsplit
。
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'
And to use RegEx
并使用正则表达式
>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'
Then match the 2nd group which lies between the last /
and the end of String. This is a greedy usage of the greedy technique in RegEx.
然后匹配位于/
字符串最后和末尾之间的第二组。这是 RegEx 中贪婪技术的贪婪用法。
Small Note- The problem with link.rpartition('//')[-1]
in your code is that you are trying to match //
and not /
. So remove the extra /
as in link.rpartition('/')[-1]
.
小注意-link.rpartition('//')[-1]
您代码中的问题在于您正在尝试匹配//
而不是/
. 所以删除额外/
的link.rpartition('/')[-1]
。
回答by TigerhawkT3
That doesn't need regex.
那不需要正则表达式。
import os
for link in link_list:
file_names.append(os.path.basename(link))
回答by McCroskey
Just use string.split:
只需使用 string.split:
url = "/some/url/with/a/file.html"
print url.split("/")[-1]
# Result should be "file.html"
split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.
split 为您提供由“/”分隔的字符串数组。[-1] 为您提供数组中的最后一个元素,这正是您想要的。
回答by deme72
This should work if you plan to use regex
如果您打算使用正则表达式,这应该有效
for link in link_list:
file_names.append(link.replace('.*/',''))
print(file_names)
回答by dawg
You can use rpartition():
您可以使用rpartition():
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')
And take the last part of the 3 element tuple that is returned:
并取返回的 3 元素元组的最后一部分:
>>> s.rpartition('/')[2]
'interpreter.html'
回答by sandoronodi
Here's a more general, regex way of doing this:
这是执行此操作的更通用的正则表达式方法:
re.sub(r'^.+/([^/]+)$', r'', "http://test.org/3/files/interpreter.html")
'interpreter.html'