bash 基于多种模式重命名文件的更好方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20629302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:01:21  来源:igfitidea点击:

Better way to rename files based on multiple patterns

linuxbashshellunixsed

提问by user3100854

a lot of files I download have crap/spam in their filenames, e.g.

我下载的很多文件的文件名中都有废话/垃圾邮件,例如

[ www.crap.com ] file.name.ext

[ www.crap.com ] file.name.ext

www.crap.com - file.name.ext

www.crap.com - file.name.ext

I've come up with two ways for dealing with them but they both seem pretty clunky:

我想出了两种处理它们的方法,但它们看起来都很笨拙:

with parameter expansion:

带参数扩展:

if [[ ${base_name} != ${base_name//\[+([^\]])\]} ]]
then
    mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//\[+([^\]])\]}" &&
        base_name="${base_name//\[+([^\]])\]}"
fi

if [[ ${base_name} != ${base_name//www.*.com - /} ]]
then
    mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//www.*.com - /}" &&
        base_name="${base_name//www.*.com - /}"
fi

# more of these type of statements; one for each type of frequently-encountered pattern

and then with echo/sed:

然后使用 echo/sed:

tmp=`echo "${base_name}" | sed -e 's/\[[^][]*\]//g' | sed -e 's/\s-\s//g'`
mv "${base_name}" "{tmp}"

I feel like the parameter expansion is the worse of the two but I like it because I'm able to keep the same variable assigned to the file for further processing after the rename (the above code is used in a script that's called for each file after the file download is complete).

我觉得参数扩展是两者中最差的,但我喜欢它,因为我能够在重命名后保留分配给文件的相同变量以供进一步处理(上面的代码用于为每个文件调用的脚本中文件下载完成后)。

So anyway I was hoping there's a better/cleaner way to do the above that someone more knowledgeable than myself could show me, preferably in a way that would allow me to easily reassign the old/original variable to the new/renamed file.

所以无论如何,我希望有一种更好/更干净的方法来执行上述操作,让比我更了解自己的人可以向我展示,最好以一种能让我轻松地将旧/原始变量重新分配给新/重命名文件的方式。

Thanks

谢谢

回答by F. Hauri

Two answer: using perlrename or using purebash

两个答案:使用perl重命名或使用bash

As there are some people who dislike perl, I wrote mybash only version

因为有些人不喜欢 perl,所以我写了我的bash only 版本

Renaming files by using the renamecommand.

使用rename命令重命名文件。

Introduction

介绍

Yes, this is a typical job for renamecommand which was precisely designed for:

是的,这是一个典型的rename指挥工作,专为:

man rename | sed -ne '/example/,/^[^ ]/p'
   For example, to rename all files matching "*.bak" to strip the
   extension, you might say

           rename 's/\.bak$//' *.bak

   To translate uppercase names to lower, you'd use

           rename 'y/A-Z/a-z/' *

More oriented samples

更多定向样品

Simply drop all spacesand square brackets:

只需删除所有空格方括号

rename 's/[ \[\]]*//g;' *.ext

Rename all .jpgby numbering from 1:

.jpg通过编号重命名所有1

rename 's/^.*$/sprintf "IMG_%05d.JPG",++$./e' *.jpg

Demo:

演示:

touch {a..e}.jpg
ls -ltr
total 0
-rw-r--r-- 1 user user 0 sep  6 16:35 e.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 d.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 c.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 b.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 a.jpg
rename 's/^.*$/sprintf "IMG_%05d.JPG",++$./e' *.jpg
ls -ltr
total 0
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00005.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00004.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00003.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00002.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00001.JPG

Full syntax for matching SO question, in safe way

以安全的方式匹配 SO 问题的完整语法

There is a strong and safeway using renameutility:

使用实用程序有一种强大而安全的方法rename

As this is perlcommon tool, we have to use perl syntax:

由于这是perl常用工具,我们必须使用 perl 语法:

rename 'my $o=$_;
        s/[ \[\]]+/-/g;
        s/-+/-/g;
        s/^-//g;
        s/-\(\..*\|\)$//g;
        s/(.*[^\d])(|-(\d+))(\.[a-z0-9]{2,6})$/
                my $i=;
                $i=0 unless $i;
                sprintf("%s-%d%s", , $i+1, )
            /eg while
               $o ne $_  &&
               -f $_;
    ' *

Testing rule:

测试规则:

touch '[ www.crap.com ] file.name.ext' 'www.crap.com - file.name.ext'
ls -1
[ www.crap.com ] file.name.ext
www.crap.com - file.name.ext
rename 'my $o=$_; ...
    ...
    ...' *
ls -1
www.crap.com-file.name-1.ext
www.crap.com-file.name.ext

touch '[ www.crap.com ] file.name.ext' 'www.crap.com - file.name.ext'
ls -1
www.crap.com-file.name-1.ext
[ www.crap.com ] file.name.ext
www.crap.com - file.name.ext
www.crap.com-file.name.ext
rename 'my $o=$_; ...
    ...
    ...' *
ls -1
www.crap.com-file.name-1.ext
www.crap.com-file.name-2.ext
www.crap.com-file.name-3.ext
www.crap.com-file.name.ext

... and so on...

... 等等...

... and it's safe while you don't use -fflag to renamecommand: file won't be overwrited and you will get an error message if something goes wrong.

...当您不使用-f标志来rename命令时它是安全的:文件不会被覆盖,如果出现问题,您将收到一条错误消息。

Renaming files by using bashand so called bashisms:

使用bash和所谓的bashisms重命名文件:

I prefer doing this by using dedicated utility, but this could even be done by using purebash(aka without any fork)

我更喜欢使用专用实用程序来做到这一点,但这甚至可以通过使用bash(也就是没有任何叉子)来完成

There is no use of any other binary than bash (no sed, awk, tror other):

除了 bash(没有、或其他)之外sed,没有使用任何其他二进制文件:awktr

#!/bin/bash

for file;do
    newname=${file//[ \]\[]/.}
    while [ "$newname" != "${newname#.}" ] ;do
        newname=${newname#.}
      done
    while [ "$newname" != "${newname//[.-][.-]/.}" ] ;do
        newname=${newname//[.-][.-]/-};done
    if [ "$file" != "$newname" ] ;then
        if [ -f $newname ] ;then
            ext=${newname##*.}
            basename=${newname%.$ext}
            partname=${basename%%-[0-9]}
            count=${basename#${partname}-}
            [ "$partname" = "$count" ] && count=0
            while printf -v newname "%s-%d.%s" $partname $[++count] $ext &&
                  [ -f "$newname" ] ;do
              :;done
          fi
        mv  "$file" $newname
      fi
  done

To be run with files as argument, for sample:

以文件作为参数运行,例如:

/path/to/my/script.sh \[*
  • Replacing spaces and square bracket by dot
  • Replacing sequences of .-, -., --or ..by only one -.
  • Test if filename don't differ, there is nothing to do.
  • Test if a file exist with newname...
  • split filename, counter and extension, for making indexed newname
  • loop if a file exist with newname
  • Finaly rename the file.
  • 用点替换空格和方括号
  • 更换的序列.--.--..仅由一个-
  • 测试文件名是否不同,没有什么可做的。
  • 测试如果一个文件存在NEWNAME...
  • 拆分文件名,计数器和扩展,为使索引NEWNAME
  • 循环,如果有文件存在NEWNAME
  • 最后重命名文件。

回答by Michael Le Barbier Grünewald

Take advantage of the following classical pattern:

利用以下经典模式:

 job_select /path/to/directory| job_strategy | job_process

where job_selectis responsible for selecting the objects of your job, job_strategyprepares a processing plan for these objects and job_processeventually executes the plan.

wherejob_select负责选择你的工作对象,job_strategy为这些对象准备一个处理计划并job_process最终执行该计划。

This assumes that filenames do not contain a vertical bar |nor a newline character.

这假设文件名不包含竖线|或换行符。

The job_select function

job_select 函数

 # job_select PATH
 #  Produce the list of files to process
 job_select()
 {
   find "" -name 'www.*.com - *' -o -name '[*] - *'
 }

The findcommand can examine all properties of the file maintained by the file system, like creation time, access time, modification time. It is also possible to control how the filesystem is explored by telling findnot to descend into mounted filesystems, how much recursions levels are allowed. It is common to append pipes to the findcommand to perform more complicated selections based on the filename.

find命令可以检查文件系统维护的文件的所有属性,如创建时间、访问时间、修改时间。还可以通过告诉find不要进入已安装的文件系统,允许多少递归级别来控制文件系统的探索方式。将管道附加到find命令以根据文件名执行更复杂的选择是很常见的。

Avoid the common pitfallof including the contents of hidden directories in the output of the job_selectfunction. For instance, the directories CVS, .svn, .svkand .gitare used by the corresponding source control management tools and it is almost always wrong to include their contents in the output of the job_selectfunction. By inadvertently batch processing these files, one can easily make the affected working copy unusable.

避免job_select函数的输出中包含隐藏目录的内容的常见陷阱。例如,目录CVS.svn.svk.git使用由相应的源代码控制管理工具,它几乎总是错的,包括在输出的内容job_select的功能。通过不经意地批处理这些文件,很容易使受影响的工作副本无法使用。

The job_strategy function

job_strategy 函数

# job_strategy
#  Prepare a plan for renaming files
job_strategy()
{
  sed -e '
    h
    s@/www\..*\.com - *@/@
    s@/\[^]]* - *@/@
    x
    G
    s/\n/|/
  '
}

This commands reads the output of job_selectand makes a plan for our renaming job. The plan is represented by text lines having two fields separated by the character |, the first field being the old name of the file and the second being the new computed file of the file, it looks like

此命令读取 的输出job_select并为我们的重命名作业制定计划。该计划由具有两个字段的文本行表示,由字符分隔|,第一个字段是文件的旧名称,第二个字段是文件的新计算文件,看起来像

[ www.crap.com ] file.name.1.ext|file.name.1.ext
www.crap.com - file.name.2.ext|file.name.2.ext

The particular program used to produce the plan is essentially irrelevant, but it is common to use sedas in the example; awkor perlfor this. Let us walk through the sed-script used here:

用于生成计划的特定程序本质上是无关紧要的,但sed在示例中使用是很常见的;awkperl为此。让我们来看看sed这里使用的-script:

h       Replace the contents of the hold space with the contents of the pattern space.
…       Edit the contents of the pattern space.
x       Swap the contents of the pattern and hold spaces.
G       Append a newline character followed by the contents of the hold space to the pattern space.
s/\n/|/ Replace the newline character in the pattern space by a vertical bar.

It can be easier to use several filters to prepare the plan. Another common case is the use of the statcommand to add creation times to file names.

使用多个过滤器来准备计划会更容易。另一种常见情况是使用该stat命令将创建时间添加到文件名中。

The job_process function

job_process 函数

# job_process
#  Rename files according to a plan
job_process()
{
   local oldname
   local newname
   while IFS='|' read oldname newname; do
     mv "$oldname" "$newname"
   done
}

The input field separatorIFS is adjusted to let the function read the output of job_strategy. Declaring oldnameand newnameas local is useful in large programs but can be omitted in very simple scripts. The job_processfunction can be adjusted to avoid overwriting existing files and report the problematic items.

所述输入字段分隔符IFS被调整为让读功能的输出job_strategy。声明oldnameand newnameas local 在大型程序中很有用,但在非常简单的脚本中可以省略。job_process可以调整该功能以避免覆盖现有文件并报告有问题的项目。

About data structures in shell programsNote the use of pipes to transfer data from one stage to the other: apprentices often rely on variables to represent such information but it turns out to be a clumsy choice. Instead, it is preferable to represent data as tabular files or as tabular data streams moving from one process to the other, in this form, data can be easily processed by powerful tools like sed, awk, join, pasteand sort— only to cite the most common ones.

关于 shell 程序中的数据结构请注意使用管道将数据从一个阶段传输到另一个阶段:学徒通常依赖变量来表示此类信息,但结果证明这是一个笨拙的选择。相反,优选的是表示数据作为表格文件或作为表格数据流从一个进程移动到另一个,在这种形式中,数据可以很容易地通过有力工具等加工sedawkjoinpastesort-只是举最常见的。

回答by Jahid

You can use rnm

您可以使用rnm

rnm -rs '/\[crap\]|\[spam\]//g' *.ext

The above will remove [crap]or [spam]from filename.

以上将从文件名中删除[crap][spam]

You can pass multiple regex pattern by terminating them with ;or overloading the -rsoption.

您可以通过使用选项终止;或重载-rs选项来传递多个正则表达式模式。

rnm -rs '/[\[\]]//g;/\s*\[crap\]//g' -rs '/crap2//' *.ext

The general format of this replace string is /search_part/replace_part/modifier

这个替换字符串的一般格式是 /search_part/replace_part/modifier

  1. search_part: regex to search for.
  2. replace_part: string to replace with
  3. modifier: i (case insensitive), g (global replace)
  1. search_part:要搜索的正则表达式。
  2. replace_part: 要替换的字符串
  3. 修饰符:i(不区分大小写),g(全局替换)

uppercase/lowercase:

大写小写:

A replace string of the form /search_part/\c/modifierwill make the selected part of the filename (by the regex search_part) lowercase while \C(capital \C) in replace part will make it uppercase.

表单的替换字符串/search_part/\c/modifier将使文件名的选定部分(通过正则表达式search_part)小写,而\C替换部分中的(大写 \C)将使其变为大写。

rnm -rs '/[abcd]/\C/g' *.ext
## this will capitalize all a,b,c,d in the filenames



如果您有许多需要处理的正则表达式模式,则将这些模式放在一个文件中并使用-rs/f-rs/f选项传递该文件。

rnm -rs/f /path/to/regex/pattern/file *.ext

You can find some other examples here.

您可以在此处找到其他一些示例。

Note:

笔记:

  1. rnm uses PCRE2 (revised PCRE) regex.
  2. You can undo an unwanted rename operation by running rnm -u
  1. rnm 使用 PCRE2(修订版 PCRE)正则表达式。
  2. 您可以通过运行来撤消不需要的重命名操作 rnm -u

P.S: I am the author of this tool.

PS:我是这个工具的作者。

回答by Stefano Falsetto

If you want to use something not depending on perl, you can use the following code (let's call it sanitizeNames.sh). It is only showing a few cases, but it's easily extensible using string substitution, tr (and sed too).

如果你想使用不依赖于 perl 的东西,你可以使用下面的代码(我们称之为sanitizeNames.sh)。它只显示了几种情况,但可以使用字符串替换、tr(以及 sed)轻松扩展。

    #!/bin/bash

    ls  |while read f; do
      newfname=$(echo "$f" \
                  |tr -d '\[ ' \    # Removing opened square bracket
                  |tr ' \]' '-' \   # Translating closing square bracket to dash
                  |tr -s '-' \      # Squeezing multiple dashes
                  |tr -s '.' \      # Squeezing multiple dots
                )
      newfname=${newfname//-./.}

      if [ -f "$newfname" ]; then
        # Some string magic...
        extension=${newfname##*\.}
        basename=${newfname%\.*}
        basename=${basename%\-[1-9]*}
        lastNum=$[ $(ls $basename*|wc -l) ] 
        mv "$f" "$basename-$lastNum.$extension"
      else
        mv "$f" "$newfname"
      fi
    done

And use it:

并使用它:

    $ touch '[ www.crap.com ] file.name.ext' 'www.crap.com - file.name.ext' '[ www.crap.com ] - file.name.ext' '[www.crap.com ].file.anothername.ext2' '[www.crap.com ].file.name.ext'
    $ ls -1 *crap*
    [ www.crap.com ] - file.name.ext
    [ www.crap.com ] file.name.ext
    [www.crap.com ].file.anothername.ext2
    [www.crap.com ].file.name.ext
    www.crap.com - file.name.ext
    $ ./sanitizeNames.sh *crap*
    $ ls -1 *crap*
    www.crap.com-file.anothername.ext2
    www.crap.com-file.name-1.ext
    www.crap.com-file.name-2.ext
    www.crap.com-file.name-3.ext
    www.crap.com-file.name.ext

回答by Sandeep

If you are using Ubunntu/Debian os use rename command to rename multiple files at time.

如果您使用的是 Ubunntu/Debian 操作系统,请使用 rename 命令一次重命名多个文件。