bash 使用 Grep 匹配文件名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7801118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 00:58:23  来源:igfitidea点击:

Matching A File Name Using Grep

regexbash

提问by Jason Zhu

The overarching problem: So I have a file name that comes in the form of JohnSmith14_120325_A10_6.raw and I want to match it using regex. I have a couple of issues in building a working example but unfortunately my issues won't be solved unless I get the basics.

首要问题:所以我有一个格式为 JohnSmith14_120325_A10_6.raw 的文件名,我想使用正则表达式匹配它。我在构建一个工作示例时遇到了一些问题,但不幸的是,除非我掌握了基础知识,否则我的问题将无法解决。

So I have just recently learned about piping and one of the cool things I learned was that I can do the following.

所以我最近刚刚了解了管道,我学到的一件很酷的事情是我可以做以下事情。

X=ll_paprika.sc (don't ask)
VAR=`echo $X | cut -p -f 1`
echo $VAR

which gives me paprika.sc Now when I try to execute the pipe idea in grep, nothing happens.

这给了我 paprika.sc 现在当我尝试在 grep 中执行管道想法时,没有任何反应。

x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR

Can anyone explain what I am doing wrong?

谁能解释我做错了什么?

Second question: How does one match a single underscore using regex?

第二个问题:如何使用正则表达式匹配单个下划线?

Here's what I am ultimately trying to do;

这就是我最终想要做的事情;

VAR=`echo $X | grep -e "^[a-bA-Z][a-bA-Z0-9]*(_){1}[0-9]*(_){1}[a-bA-Z0-9]*(_){1}[0-9](\.){1}(raw)"

So the basic idea of my pattern here is that the file name must start with a letter and then it can have any number of letters and numbers following it and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers and then it must have a single period following by raw. This looks grossly wrong and ugly (because I am not sure about the syntax). So how does one match a file extension? Can someone put up a simple example for something ll_parpika.sc so that I can figure out how to do my own regex?

所以我这里模式的基本思想是文件名必须以字母开头,然后它可以有任意数量的字母和数字,它必须有一个 _ 分隔一系列数字,另一个 _ 分隔下一组数字和字符以及另一个 _ 来分隔下一组数字,然后它必须有一个单独的句点,后面跟原始的。这看起来非常错误和丑陋(因为我不确定语法)。那么如何匹配文件扩展名呢?有人可以为 ll_parpika.sc 举一个简单的例子,这样我就可以弄清楚如何做我自己的正则表达式吗?

Thanks.

谢谢。

回答by drysdam

x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR

The reason this isn't doing what you want is that the grep matches a line and returns it. *.scdoes in fact match 11_paprika.sc, so it returns that whole line and sticks it in $VAR.

这不是你想要的原因是 grep 匹配一行并返回它。*.sc实际上匹配11_paprika.sc,因此它返回整行并将其粘贴在$VAR.

If you want to just get a part of it, the cutline probably better. There is a grep -ooption that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?

如果你只想得到它的一部分,这cut条线可能会更好。有一个grep -o选项只返回匹配的部分,但为此你基本上必须输入你正在寻找的东西,在这一点上何必呢?

the file name must start with a letter

文件名必须以字母开头

`grep -e "^[a-zA-Z]

`grep -e "^[a-zA-Z]

and then it can have any number of letters and numbers following it

然后它后面可以有任意数量的字母和数字

[a-zA-Z0-9]*

[a-zA-Z0-9]*

and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers

它必须有一个 _ 分隔一系列数字,另一个 _ 分隔下一组数字和字符,另一个 _ 分隔下一组数字

(_[0-9]+){3}

(_[0-9]+){3}

and then it must have a single period following by raw.

然后它必须有一个单独的句点,后面是原始的。

.raw"

。生的”

回答by imm

For the first, use:

首先,使用:

VAR=`echo $X | egrep '\.sc$'`

For the second, you can try this alternative instead:

对于第二个,你可以试试这个替代方案:

VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`

Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.

请注意,您的表达式中的字符类与后面的描述不同,因为它们在某些地方似乎只允许 ab 用于小写字符。此示例允许这些位置中的所有字母数字字符。