Python 在最后一个正斜杠之前删除部分字符串

Question

提问by freddiev4

The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.

我目前正在开发的程序从网站检索 URL 并将它们放入列表中。我想得到的是 URL 的最后一部分。

So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"I would want to remove everything before "interpreter.html".

因此，如果我的 URL 列表中的第一个元素是"https://docs.python.org/3.4/tutorial/interpreter.html"我想删除"interpreter.html".

Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.

有没有我可以使用的函数、库或正则表达式来实现这一点？我看过其他 Stack Overflow 帖子，但解决方案似乎不起作用。

These are two of my several attempts:

这是我多次尝试中的两个：

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

&

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)

Answer 1

采纳答案by Bhargav Rao

Have a look at str.rsplit.

看看str.rsplit。

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

And to use RegEx

并使用正则表达式

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

Then match the 2nd group which lies between the last /and the end of String. This is a greedy usage of the greedy technique in RegEx.

然后匹配位于/字符串最后和末尾之间的第二组。这是 RegEx 中贪婪技术的贪婪用法。

Regular expression visualization

正则表达式可视化

Debuggex Demo

调试器演示

Small Note- The problem with link.rpartition('//')[-1]in your code is that you are trying to match //and not /. So remove the extra /as in link.rpartition('/')[-1].

小注意-link.rpartition('//')[-1]您代码中的问题在于您正在尝试匹配//而不是/. 所以删除额外/的link.rpartition('/')[-1]。

Answer 2

回答by TigerhawkT3

That doesn't need regex.

那不需要正则表达式。

import os

for link in link_list:
    file_names.append(os.path.basename(link))

Answer 3

回答by McCroskey

Just use string.split:

只需使用 string.split：

url = "/some/url/with/a/file.html"

print url.split("/")[-1]

# Result should be "file.html"

split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.

split 为您提供由“/”分隔的字符串数组。[-1] 为您提供数组中的最后一个元素，这正是您想要的。

Answer 4

回答by deme72

This should work if you plan to use regex

如果您打算使用正则表达式，这应该有效

 for link in link_list:
    file_names.append(link.replace('.*/',''))
 print(file_names)

Answer 5

回答by dawg

You can use rpartition():

您可以使用rpartition()：

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')

And take the last part of the 3 element tuple that is returned:

并取返回的 3 元素元组的最后一部分：

>>> s.rpartition('/')[2]
'interpreter.html'

Answer 6

回答by sandoronodi

Here's a more general, regex way of doing this:

这是执行此操作的更通用的正则表达式方法：

    re.sub(r'^.+/([^/]+)$', r'', "http://test.org/3/files/interpreter.html")
    'interpreter.html'

Python 在最后一个正斜杠之前删除部分字符串

提问by freddiev4

采纳答案by Bhargav Rao

回答by TigerhawkT3

回答by McCroskey

回答by deme72

回答by dawg

回答by sandoronodi

相关推荐

最近更新

标签

Python 在最后一个正斜杠之前删除部分字符串

提问by freddiev4

采纳答案by Bhargav Rao

回答by TigerhawkT3

回答by McCroskey

回答by deme72

回答by dawg

回答by sandoronodi

相关推荐

使用python进行线性回归的简单预测

Python 从熊猫数据帧整体返回最大值，而不是基于列或行

Python 按时间戳列过滤/选择熊猫数据框的行

在 Python 中将 UTF-8 转换为字符串文字

相关推荐

最近更新

标签