Javascript REGEX:从没有文件扩展名的 URL 捕获文件名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3671522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 05:39:48  来源:igfitidea点击:

REGEX: Capture Filename from URL without file extension

javascriptregexurl

提问by AyexeM

I am trying to create a Javascript Regex that captures the filename without the file extension. I have read the other posts here and 'goto this page:http://gunblad3.blogspot.com/2008/05/uri-url-parsing.html' seems to be the default answer. This doesn't seem to do the job for me. So here is how I'm trying to get the regex to work:

我正在尝试创建一个 Javascript Regex 来捕获没有文件扩展名的文件名。我已阅读此处的其他帖子,“转到此页面:http : //gunblad3.blogspot.com/2008/05/uri-url-parsing.html”似乎是默认答案。这似乎对我不起作用。所以这就是我试图让正则表达式工作的方式:

  1. Find the last forward slash '/' in the subject string.
  2. Capture everything between that slash and the next period.
  1. 查找主题字符串中的最后一个正斜杠“/”。
  2. 捕获该斜线和下一个期间之间的所有内容。

The closest I could get was : /([^/]).\w$Which on the string 'http://example.com/index.htm'exec() would capture /index.htmand index.

我能得到的最接近的是:/([^/] ).\w$字符串' http://example.com/index.htm'exec() 将捕获/index.htmindex

I need this to only capture index.

我需要这个来只捕获index

回答by Daniel Vandersluis

var url = "http://example.com/index.htm";
var filename = url.match(/([^\/]+)(?=\.\w+$)/)[0];

Let's go through the regular expression:

让我们来看看正则表达式:

[^\/]+    # one or more character that isn't a slash
(?=       # open a positive lookahead assertion
  \.      # a literal dot character
  \w+     # one or more word characters
  $       # end of string boundary
)         # end of the lookahead

This expression will collect all characters that aren't a slash that are immediately followed (thanks to the lookahead) by an extension and the end of the string -- or, in other words, everything after the last slash and until the extension.

此表达式将收集所有不是斜杠的字符,这些字符后紧跟(感谢lookahead)后跟扩展名和字符串的结尾——或者,换句话说,从最后一个斜杠之后到扩展名之前的所有字符。

Alternately, you can do this without regular expressions altogether, by finding the position of the last /and the last .using lastIndexOfand getting a substringbetween those points:

或者,您可以完全不使用正则表达式来执行此操作,方法是使用并在这些点之间找到最后一个/和最后一个的位置:.lastIndexOfsubstring

var url = "http://example.com/index.htm";
var filename = url.substring(url.lastIndexOf("/") + 1, url.lastIndexOf("."));

回答by BGerrissen

tested and works, even for pages without file extension.

测试和工作,即使对于没有文件扩展名的页面。

var re = /([\w\d_-]*)\.?[^\\/]*$/i;

var url = "http://stackoverflow.com/questions/3671522/regex-capture-filename-from-url-without-file-extention";
alert(url.match(re)[1]); // 'regex-capture-filename-from-url-without-file-extention'

url = 'http://gunblad3.blogspot.com/2008/05/uri-url-parsing.html';
alert(url.match(re)[1]); // 'uri-url-parsing'

([\w\d_-]*)get a string containing letters, digits, underscores or hyphens.
\.?perhaps the string is followed by a period.
[^\\\/]*$but certainly not followed by a slash or backslash till the very end.
/ioh yeh, ignore case.

([\w\d_-]*)获取包含字母、数字、下划线或连字符的字符串。
\.?也许字符串后跟一个句点。
[^\\\/]*$但肯定不会跟随斜线或反斜线直到最后。
/i哦,是的,忽略大小写。

回答by Adam Lockhart

I did not find any of the answers to be near robust enough. Here is my solution.

我没有找到任何足够稳健的答案。这是我的解决方案。

function getFileName(url, includeExtension) {
    var matches = url && typeof url.match === "function" && url.match(/\/?([^/.]*)\.?([^/]*)$/);
    if (!matches)
        return null;

    if (includeExtension && matches.length > 2 && matches[2]) {
        return matches.slice(1).join(".");
    }
    return matches[1];
}

var url = "http://example.com/index.htm";
var filename = getFileName(url);
// index
filename = getFileName(url, true);
// index.htm

url = "index.htm";
filename = getFileName(url);
// index
filename = getFileName(url, true);
// index.htm

// BGerrissen's examples
url = "http://stackoverflow.com/questions/3671522/regex-capture-filename-from-url-without-file-extention";
filename = getFileName(url);
// regex-capture-filename-from-url-without-file-extention
filename = getFileName(url, true);
// regex-capture-filename-from-url-without-file-extention

url = "http://gunblad3.blogspot.com/2008/05/uri-url-parsing.html";
filename = getFileName(url);
// uri-url-parsing
filename = getFileName(url, true);
// uri-url-parsing.html

// BGerrissen fails
url = "http://gunblad3.blogspot.com/2008/05/uri%20url-parsing.html";
filename = getFileName(url);
// uri%20url-parsing
filename = getFileName(url, true);
// uri%20url-parsing.html

// George Pantazis multiple dots
url = "http://gunblad3.blogspot.com/2008/05/foo.global.js";
filename = getFileName(url);
// foo
filename = getFileName(url, true);
// foo.global.js

// Fringe cases
url = {};
filename = getFileName(url);
// null
url = null;
filename = getFileName(url);
// null

To fit with the original question, the default behavior is to exclude the extension, but that can easily be reversed.

为了适应原始问题,默认行为是排除扩展名,但这很容易逆转。

回答by Colin Hebert

You can try this regex :

你可以试试这个正则表达式:

([^/]*)\.[^.]*$

回答by Anurag Anand

Try this regex. It can even handle filenames with multiple periods.

试试这个正则表达式。它甚至可以处理具有多个句点的文件名。

(?<=\/)[^\/]*(?=\.\w+$)