bash 如何制作正则表达式以匹配文件路径?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37370301/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do make a regular expression to match file paths?
提问by Bohemian
I've been playing with this command for about an hour or two and I'm afraid I may have lost objectivity. The goal is to match only relative file paths given to bash.
我已经玩这个命令大约一两个小时了,恐怕我已经失去了客观性。目标是仅匹配提供给 bash 的相对文件路径。
The first relative path .
or ./some/file/path
第一个相对路径.
或./some/file/path
The second relative path is ..
or ../some/file/path
第二个相对路径是..
或../some/file/path
Where the length of "/some/file/path" is arbitrary.
其中“/some/file/path”的长度是任意的。
I've been using grep
within bash
to try and figure out how to implement it in to my script so that I can expand it to it's absolute file path so that ./some/file/path
or ../some/file/path
becomes /the/absolute/file/path
; which I've already figure out.
我一直在使用grep
insidebash
尝试弄清楚如何在我的脚本中实现它,以便我可以将它扩展到它的绝对文件路径,以便./some/file/path
或../some/file/path
变为/the/absolute/file/path
; 我已经弄清楚了。
My problem is matching the relative path.
我的问题是匹配相对路径。
The code I've been using is
我一直在使用的代码是
echo "../some/file/path" | egrep '\.{1}/?[[:graph:]]?+$'
and
和
echo "../some/file/path" | egrep '\.{2}/?[[:graph:]]?+$'
and I've narrowed my issue down to being
我已经将我的问题缩小到
echo ".." | egrep '\.{2}'
will match the dot as long as it has 2 + n
occurrences, not exactly 2 occurrencesas expected. The same thing happens when I change it to
只要它2 + n
出现,就会匹配点,而不是像预期的那样准确地出现 2 次。当我将其更改为
echo ".." | egrep '\.{1}'
will still match for some reason I can't figure out.
由于某种我无法弄清楚的原因,它仍然会匹配。
The final implementation is supposed to work something like this
最终的实现应该是这样的
41 _expand_relative_path () {
42 if [[ "" =~ ^\.{1}/?[[:graph:]]?+$ ]]; then
43 echo "."
44 elif [[ "" =~ ^\.{2}/?[[:graph:]]?+$ ]]; then
45 echo ".."
46 else
47 echo ""
48 fi
49 }
According to my text book, the specifier {n} will match the preceding element if it occurs exactly n times. But it doesn't do that! It matches if it is n or more times! What am I doing wrong?
根据我的教科书,如果前面的元素恰好出现 n 次,说明符 {n} 将匹配它。但它不会那样做!如果是 n 次或更多次,则匹配!我究竟做错了什么?
回答by Bohemian
The regex that matches a relative path is one that doesn't start with a slash:
匹配相对路径的正则表达式不以斜杠开头:
^[^/].*
回答by Scott Weaver
the issue with ^\.{1}/?[[:graph:]]?+$
is that the /
has been designated as optional, and the following [[:graph:]]
character class matches anything visible, including more periods. also, you've quantified your character class with ?+
, which means "zero or once, possessive": it ain't gotta match, but if it does, will not "give up" what it matched to let the rest of the pattern try to succeed - probably not what you wanted there.
问题^\.{1}/?[[:graph:]]?+$
在于/
已被指定为可选,并且以下[[:graph:]]
字符类匹配任何可见的,包括更多句点。此外,您已经用 量化了您的字符类?+
,这意味着“零或一次,所有格”:它不必匹配,但如果匹配,则不会“放弃”它匹配的内容,让模式的其余部分尝试成功 - 可能不是你想要的。
when you say echo ".." | egrep '\.{2}'
, what you're saying is "string contains, at some point, two periods in a row" - but that doesn't mean it can't have more periods or anything else, not without ^
and $
anchors anyway, which would limit to exactly and onlytwo periods.
当你说echo ".." | egrep '\.{2}'
,你说的是“字符串在某些时候包含连续的两个句点”——但这并不意味着它不能有更多的句点或其他任何东西,无论如何不能没有^
和$
锚点,这会仅限于恰好且只有两个时期。
as others note, any path not starting with /
is relative so ^[^/].*
works. But if you wanted to find relative paths that are in a text file with some other text, this may be useful:
正如其他人所指出的,任何不以开头的路径/
都是相对的,因此^[^/].*
有效。但是,如果您想查找包含其他文本的文本文件中的相对路径,这可能很有用:
(\.{1,2}(?:\/[[:alnum:]]*)*)
regex demooutput:
正则表达式演示输出:
回答by John Bollinger
echo ".." | egrep '\.{2}'
will match the dot as long as it has 2 + n occurrences, not exactly 2 occurrences as expected.
echo ".." | egrep '\.{2}'
只要它有 2 + n 次出现,就会匹配该点,而不是预期的 2 次出现。
Well yes. By default, grep
print lines that containthe pattern. Any line that contains more than two consecutive dots necessarily contains two consecutive dots, so the pattern matches.
嗯,是。默认情况下,grep
打印包含模式的行。任何包含两个以上连续点的行必然包含两个连续点,因此模式匹配。
The same thing happens when I change it to
echo ".." | egrep '\.{1}'
will still match for some reason I can't figure out.
当我将其更改为
echo ".." | egrep '\.{1}'
由于某种我无法弄清楚的原因,它仍然会匹配。
Same thing: the string ".." contains a '.', therefore it matches the pattern.
同样的事情:字符串“..”包含一个'.',因此它匹配模式。
Consider, now, your original pattern, '\.{2}/?[[:graph:]]?+$'
:
现在,请考虑您的原始模式'\.{2}/?[[:graph:]]?+$'
:
- In the first place, observe that it is not anchored to the beginning of the string, so it will match absolute paths of the form
/foo/bar../baz
(and others). You need an initial^
in the pattern to anchor it. - You make the presence of a
/
after the leading dots optional by using the?
quantifier. It is unclear why you do this if your objective is specifically to match paths where the first segment is..
. The only thing I can think of is that you want to match the path that is exactly..
itself, which your pattern does, but it is tooaccepting. - The next segment is
[[:graph:]]?+
, which seems an odd way to write the more standard[[:graph:]]*
. Additionally, you seem here to be relying on the fact that[[:graph:]]
will match the/
character, which it will, so you might as well roll the preceding optional/
right into the character class:'^\.{2}[[:graph:]]*$'
. - Now observe that
[[:graph:]]
alsomatches.
. This now explains why the original pattern matches strings that contain more than two consecutive dots: the first two are matched by the\.{2}
, nothing is matched by/?
, and the remaining dots (and perhaps other characters) are matched by[[:graph:]]?+
. - Finally, consider that
\.\.
is shorter and clearer than\.{2}
, and especially that plain\.
is farclearer than\.{1}
.
- 首先,请注意它没有锚定到字符串的开头,因此它将匹配表单
/foo/bar../baz
(和其他)的绝对路径。您需要^
模式中的首字母来锚定它。 - 您
/
可以通过使用?
量词使前导点之后的a 的存在成为可选。如果您的目标是专门匹配第一段所在的路径,则不清楚为什么要这样做..
。我唯一能想到的是,你想匹配与..
你的模式完全相同的路径,但它太容易接受了。 - 下一段是
[[:graph:]]?+
,这似乎是一种写得更标准的奇怪方式[[:graph:]]*
。此外,您在这里似乎依赖于[[:graph:]]
将匹配/
字符的事实,它会匹配,因此您不妨将前面的可选/
权利滚动到字符类中:'^\.{2}[[:graph:]]*$'
。 - 现在观察
[[:graph:]]
也匹配.
。这现在解释了为什么原始模式匹配包含两个以上连续点的字符串:前两个由 匹配\.{2}
,没有任何匹配由/?
,其余的点(可能还有其他字符)由 匹配[[:graph:]]?+
。 - 最后,考虑到
\.\.
短比更清晰\.{2}
,而且特别是平原\.
是远远比更清晰\.{1}
。
Of course, in his answer, @Bohemian presents the natural pattern for matching every possible relative path, but if you wanted a pattern to specifically match paths whose first segments are .
or ..
, including those without other segments, and without a trailing /
then you might try this:
当然,在他的回答中,@Bohemian 提出了匹配每个可能的相对路径的自然模式,但是如果您想要一个模式来专门匹配其第一段为.
or 的路径..
,包括那些没有其他段且没有尾随的路径,/
那么您可以尝试这个:
egrep '^\.{1,2}(/.*[^/])?$'
- It is anchored at the beginning (
^
) and at the end ($
), so it performs whole-line matches (only). - matching lines must begin with one or two dots (
\.{1,2}
) - Anything else is optional (
(...)?
), but if that optional segment is present then it must begin with a/
and end with a character that is not/
. In between can be any number, including zero, of any character (.*
). - Note that Unix file and directory names can contain whitespace and non-graphical characters, so using
[:graph:]
in your original pattern restricted it to a subset of possible paths.
- 它锚定在开头 (
^
) 和结尾 ($
),因此它执行整行匹配(仅)。 - 匹配行必须以一个或两个点 (
\.{1,2}
)开头 - 其他任何内容都是可选的 (
(...)?
),但如果该可选段存在,则它必须以 a 开头/
并以不是 的字符结尾/
。中间可以是任何字符 (.*
) 的任何数字,包括零。 - 请注意,Unix 文件和目录名称可以包含空格和非图形字符,因此
[:graph:]
在原始模式中使用会将其限制为可能路径的子集。
回答by Nae
For Windows: ^.*\\(?!.*\\)(.*)$
对于 Windows: ^.*\\(?!.*\\)(.*)$
or for Linux: ^.*/(?!.*/)(.*)$
或对于 Linux: ^.*/(?!.*/)(.*)$
or for both:
或两者兼而有之:
^.*(?:\\|/)(?!.*(?:\\|/))(.*)$
^.*(?:\\|/)(?!.*(?:\\|/))(.*)$
It matches the filename.extension
in a .../path/filename.extension
or ...\path\filename.extension
as it checks for the last occurrence of either \
or /
and then captures every character from that point forward.
它匹配filename.extension
in a .../path/filename.extension
or...\path\filename.extension
因为它检查最后一次出现的\
or /
,然后从该点向前捕获每个字符。