bash 从文件中删除字符串并保留数字的脚本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12396673/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:15:55  来源:igfitidea点击:

script to remove string and keep numbers from files

bashsedawk

提问by Khaled Lela

The titles of my files are formatted like the following:

我的文件标题的格式如下:

  • fileName01
  • file07
  • fileTitle8
  • fileName20

  • There is no delimiter between the string and numbers and the String character is not equal on each line.

  • I want to output just the numbers from the end of the filename:

  • 01
  • 07
  • 8
  • 20
  • 文件名01
  • 文件07
  • 文件标题8
  • 文件名20

  • 字符串和数字之间没有分隔符,并且每行的字符串字符不相等。

  • 我只想输出文件名末尾的数字:

  • 01
  • 07
  • 8
  • 20

回答by P.P

Use tr:

使用 tr:

cat filename | tr -d [:alpha:]

回答by newfurniturey

If you specifically want onlythe numbers and there is a possibility of non-alphanumeric characters, you can use sedand [^0-9]:

如果您需要数字并且可能有非字母数字字符,则可以使用sed[^0-9]

cat filename | sed 's|[^0-9]||g'

Additionally, if the possibility of duplicates exists and order is not an issue, you can combine this with sortand uniq:

此外,如果存在重复的可能性并且顺序不是问题,您可以将其与sort和结合使用uniq

cat filename | sed 's|[^0-9]||g' | sort | uniq

This last example will give you a distinct list of numbers found in the file; however, it does respect a leading 0(i.e. - 8!= 08).

最后一个示例将为您提供在文件中找到的不同数字列表;但是,它确实尊重领先0(即 - 8!= 08)。

UPDATE(bash-only):

更新(仅限 bash):

while read line; do \
    echo ${line//[^0-9]/}; \
done < filename

Though less readable (from my point of view), it is a viable alternative that accomplishes the same goal. Also, appending |sort | uniqwill still work with this example too.

虽然可读性较差(从我的角度来看),但它是实现相同目标的可行替代方案。此外,附加|sort | uniq仍然适用于这个例子。

EDIT(file-extensions)
To keep file-extensions (or any text afterthe first instance of numbers), per a comment by the OP, removing the gfrom the sedcommand and adding a *will handle this:

编辑(文件扩展名)
要保留文件扩展名(或第一个数字实例之后的任何文本),根据 OP 的评论,gsed命令中删除并添加 a*将处理此问题:

cat filename | sed 's|[^0-9]*||'

This will keep everything after the firstinstance of numbers, so filename123.mp3becomes 123.mp3, and file123part456.txtbecomes 123part456.txt.

这将保留第一个数字实例之后的所有内容,因此filename123.mp3变为123.mp3,并file123part456.txt变为123part456.txt

If you need an extremely sensitive match to specifically get onlythe last numbers and any existing file-extension (with the possibility of no file-extension, as the original question shows examples of), you can use grepwith the -Pand -oflags:

如果你需要一个非常敏感的比赛专门让只有最后一个数字和任何现有的文件扩展名(不带文件扩展名的可能性,因为原来的问题表明的例子),你可以使用grep-P-o标志:

grep -Po "[0-9]*(\..*)?" filename

This will cause filename123.mp3to return 123.mp3, and file123part456.txtto return 456.txt. The -Pflag indicates to interpret the pattern as a Perl regular expression; the -oindicates to return only the matching part of the lines - not the full line that matches.

这将导致filename123.mp3返回123.mp3,并file123part456.txt返回456.txt。该-P标志表示将模式解释为 Perl 正则表达式;该-o指示只返回行的匹配部分-而不是全线匹配。

回答by Thor

I would use grep -ofor the question posted by the OP:

我将grep -o用于 OP 发布的问题:

grep -o '[0-9]*' filenames

Edit

编辑

In the comments the OP asked how to remove leading text, in that case use:

在评论中,OP 询问如何删除前导文本,在这种情况下使用:

sed 's/[^0-9]*//' filename

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

echo filename123onetwothree.999 | sed 's/.*[^0-9]\([0-9]*\)$//'
999

This extracts only the numbers from the endof the filename.

这仅提取文件名末尾的数字。

To make it universal use:

为了使其普遍使用:

sed 's/.*[^[:digit:]]\([[:digit:]]*\)$//' file

回答by Jeremy J Starcher

Assuming ASCII strings

假设 ASCII 字符串

echo "HelloTrailz23" | tr -d '[A-Z][a-z]'

echo "HelloTrailz23" | tr -d '[AZ][az]'

If you are dealing with unicode file names, all bets are off.

如果您正在处理 unicode 文件名,那么所有赌注都将关闭。

回答by mikel

I always like to use bash's variable string manipulation. It's overkill, but it quickly works on the command line.

我总是喜欢使用 bash 的可变字符串操作。这有点矫枉过正,但它很快就可以在命令行上运行。

for i in fileName01 file07 fileTitle8 fileName20 file123._mp3 ; do echo ${i//[!0-9]} ; done

Result:

结果:

01
07
8
20
1233

The //[!0-9] within the ${i} variable removes everything except numbers that are in each string as it loops through the list.

${i} 变量中的 //[!0-9] 删除除每个字符串中的数字之外的所有内容,因为它会在列表中循环。