bash 使用 Grep 匹配文件名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7801118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Matching A File Name Using Grep
提问by Jason Zhu
The overarching problem: So I have a file name that comes in the form of JohnSmith14_120325_A10_6.raw and I want to match it using regex. I have a couple of issues in building a working example but unfortunately my issues won't be solved unless I get the basics.
首要问题:所以我有一个格式为 JohnSmith14_120325_A10_6.raw 的文件名,我想使用正则表达式匹配它。我在构建一个工作示例时遇到了一些问题,但不幸的是,除非我掌握了基础知识,否则我的问题将无法解决。
So I have just recently learned about piping and one of the cool things I learned was that I can do the following.
所以我最近刚刚了解了管道,我学到的一件很酷的事情是我可以做以下事情。
X=ll_paprika.sc (don't ask)
VAR=`echo $X | cut -p -f 1`
echo $VAR
which gives me paprika.sc Now when I try to execute the pipe idea in grep, nothing happens.
这给了我 paprika.sc 现在当我尝试在 grep 中执行管道想法时,没有任何反应。
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
Can anyone explain what I am doing wrong?
谁能解释我做错了什么?
Second question: How does one match a single underscore using regex?
第二个问题:如何使用正则表达式匹配单个下划线?
Here's what I am ultimately trying to do;
这就是我最终想要做的事情;
VAR=`echo $X | grep -e "^[a-bA-Z][a-bA-Z0-9]*(_){1}[0-9]*(_){1}[a-bA-Z0-9]*(_){1}[0-9](\.){1}(raw)"
So the basic idea of my pattern here is that the file name must start with a letter and then it can have any number of letters and numbers following it and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers and then it must have a single period following by raw. This looks grossly wrong and ugly (because I am not sure about the syntax). So how does one match a file extension? Can someone put up a simple example for something ll_parpika.sc so that I can figure out how to do my own regex?
所以我这里模式的基本思想是文件名必须以字母开头,然后它可以有任意数量的字母和数字,它必须有一个 _ 分隔一系列数字,另一个 _ 分隔下一组数字和字符以及另一个 _ 来分隔下一组数字,然后它必须有一个单独的句点,后面跟原始的。这看起来非常错误和丑陋(因为我不确定语法)。那么如何匹配文件扩展名呢?有人可以为 ll_parpika.sc 举一个简单的例子,这样我就可以弄清楚如何做我自己的正则表达式吗?
Thanks.
谢谢。
回答by drysdam
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
The reason this isn't doing what you want is that the grep matches a line and returns it. *.scdoes in fact match 11_paprika.sc, so it returns that whole line and sticks it in $VAR.
这不是你想要的原因是 grep 匹配一行并返回它。*.sc实际上匹配11_paprika.sc,因此它返回整行并将其粘贴在$VAR.
If you want to just get a part of it, the cutline probably better. There is a grep -ooption that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?
如果你只想得到它的一部分,这cut条线可能会更好。有一个grep -o选项只返回匹配的部分,但为此你基本上必须输入你正在寻找的东西,在这一点上何必呢?
the file name must start with a letter
文件名必须以字母开头
`grep -e "^[a-zA-Z]
`grep -e "^[a-zA-Z]
and then it can have any number of letters and numbers following it
然后它后面可以有任意数量的字母和数字
[a-zA-Z0-9]*
[a-zA-Z0-9]*
and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers
它必须有一个 _ 分隔一系列数字,另一个 _ 分隔下一组数字和字符,另一个 _ 分隔下一组数字
(_[0-9]+){3}
(_[0-9]+){3}
and then it must have a single period following by raw.
然后它必须有一个单独的句点,后面是原始的。
.raw"
。生的”
回答by imm
For the first, use:
首先,使用:
VAR=`echo $X | egrep '\.sc$'`
For the second, you can try this alternative instead:
对于第二个,你可以试试这个替代方案:
VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`
Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.
请注意,您的表达式中的字符类与后面的描述不同,因为它们在某些地方似乎只允许 ab 用于小写字符。此示例允许这些位置中的所有字母数字字符。

