Python 在最后一个正斜杠之前删除部分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29657384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:50:58  来源:igfitidea点击:

Remove Part of String Before the Last Forward Slash

pythonregexstringreplace

提问by freddiev4

The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.

我目前正在开发的程序从网站检索 URL 并将它们放入列表中。我想得到的是 URL 的最后一部分。

So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"I would want to remove everything before "interpreter.html".

因此,如果我的 URL 列表中的第一个元素是"https://docs.python.org/3.4/tutorial/interpreter.html"我想删除"interpreter.html".

Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.

有没有我可以使用的函数、库或正则表达式来实现这一点?我看过其他 Stack Overflow 帖子,但解决方案似乎不起作用。

These are two of my several attempts:

这是我多次尝试中的两个:

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

&

&

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)

采纳答案by Bhargav Rao

Have a look at str.rsplit.

看看str.rsplit

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

And to use RegEx

并使用正则表达式

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

Then match the 2nd group which lies between the last /and the end of String. This is a greedy usage of the greedy technique in RegEx.

然后匹配位于/字符串最后和末尾之间的第二组。这是 RegEx 中贪婪技术的贪婪用法。

Regular expression visualization

正则表达式可视化

Debuggex Demo

调试器演示

Small Note- The problem with link.rpartition('//')[-1]in your code is that you are trying to match //and not /. So remove the extra /as in link.rpartition('/')[-1].

小注意-link.rpartition('//')[-1]您代码中的问题在于您正在尝试匹配//而不是/. 所以删除额外/link.rpartition('/')[-1]

回答by TigerhawkT3

That doesn't need regex.

那不需要正则表达式。

import os

for link in link_list:
    file_names.append(os.path.basename(link))

回答by McCroskey

Just use string.split:

只需使用 string.split:

url = "/some/url/with/a/file.html"

print url.split("/")[-1]

# Result should be "file.html"

split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.

split 为您提供由“/”分隔的字符串数组。[-1] 为您提供数组中的最后一个元素,这正是您想要的。

回答by deme72

This should work if you plan to use regex

如果您打算使用正则表达式,这应该有效

 for link in link_list:
    file_names.append(link.replace('.*/',''))
 print(file_names)

回答by dawg

You can use rpartition():

您可以使用rpartition()

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')

And take the last part of the 3 element tuple that is returned:

并取返回的 3 元素元组的最后一部分:

>>> s.rpartition('/')[2]
'interpreter.html'

回答by sandoronodi

Here's a more general, regex way of doing this:

这是执行此操作的更通用的正则表达式方法:

    re.sub(r'^.+/([^/]+)$', r'', "http://test.org/3/files/interpreter.html")
    'interpreter.html'