bash 如何在不填满磁盘空间的情况下grep tar 存档中的文件中的模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13041068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:37:50  来源:igfitidea点击:

How to grep for a pattern in the files in tar archive without filling up disk space

linuxbashshelltar

提问by Ankur Agarwal

I have a tar archive which is very big ~ 5GB.

我有一个非常大的 tar 档案,大约 5GB。

I want to grep for a pattern on all files (and also print the name of the file that has the pattern ) in the archive but do not want to fill up my disk space by extracting the archive.

我想在存档中的所有文件上搜索模式(并打印具有模式的文件的名称),但不想通过提取存档来填满我的磁盘空间。

Anyway I can do that?

反正我能做到吗?

I tried these, but this does not give me the file names that contain the pattern, just the matching lines:

我试过这些,但这并没有给我包含模式的文件名,只是匹配的行:

tar -O -xf test.tar.gz | grep 'this'
tar -xf test.tar.gz --to-command='grep awesome'

Also where is this feature of tar documented? tar xf test.tar $FILE

tar 的这个特性也记录在哪里?tar xf test.tar $FILE

回答by Petr Pudlák

Seems like nobody posted this simple solution that processes the archive only once:

似乎没有人发布这个只处理存档一次的简单解决方案

tar xzf archive.tgz --to-command \
    'grep --label="$TAR_FILENAME" -H PATTERN ; true'

Here tarpasses the name of each file in a variable (see the docs) and it is used by grepto print it with each match. Also trueis added so that tardoesn't complain about failing to extract files that don't match.

这里tar在一个变量中传递每个文件的名称(参见文档),它被用于grep在每个匹配项中打印它。还true添加了这样的内容,以便tar不会抱怨无法提取不匹配的文件。

回答by ghoti

Here's my take on this:

这是我的看法:

while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')

Broken out for explanation:

拆开解释:

  • while read filename; do-- it's a loop...
  • tar -xOf file.tar "$filename"-- this extracts each file...
  • | grep 'pattern'-- here's where you put your pattern...
  • | sed "s|^|$filename:|";- prepend the filename, so this looks like grep. Salt to taste.
  • done < <(tar -tf file.tar | grep -v '/$')-- end the loop, get the list of files as to fead to your while read.
  • while read filename; do——这是一个循环……
  • tar -xOf file.tar "$filename"-- 这将提取每个文件...
  • | grep 'pattern'- 这是你放置图案的地方......
  • | sed "s|^|$filename:|";- 在文件名前面加上,所以这看起来像 grep。盐调味。
  • done < <(tar -tf file.tar | grep -v '/$')- 结束循环,获取文件列表以发送到您的while read.

One proviso: this breaks if you have OR bars (|) in your filenames.

一个附带条件:如果|文件名中有 OR 条 ( ),这会中断。

Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrcfile:

唔。事实上,这是一个不错的小 bash 函数,您可以将其附加到您的.bashrc文件中:

targrep() {

  local taropt=""

  if [[ ! -f "" ]]; then
    echo "Usage: targrep pattern file ..."
  fi

  while [[ -n "" ]]; do    

    if [[ ! -f "" ]]; then
      echo "targrep: : No such file" >&2
    fi

    case "" in
      *.tar.gz) taropt="-z" ;;
      *) taropt="" ;;
    esac

    while read filename; do
      tar $taropt -xOf "" \
       | grep "" \
       | sed "s|^|$filename:|";
    done < <(tar $taropt -tf  | grep -v '/$')

  shift

  done
}

回答by Steve

Here's a bash function that may work for you. Add the following to your ~/.bashrc

这是一个可能对您有用的 bash 函数。将以下内容添加到您的~/.bashrc

targrep () {
    for i in $(tar -tzf ""); do
        results=$(tar -Oxzf "" "$i" | grep --label="$i" -H "")
        echo "$results"
    done
}

Usage:

用法:

targrep archive.tar.gz "pattern"

回答by aecolley

It's incredibly hacky, but you could abuse tar's -voption to process and delete each file as it is extracted.

这是令人难以置信的hacky,但是您可以滥用tar 的-v选项来处理和删除每个提取的文件。

grep_and_delete() {
  if [ -n "" -a -f "" ]; then
    grep -H 'this' -- "" </dev/null
    rm -f -- "" </dev/null
  fi
}
mkdir tmp; cd tmp
tar -xvzf test.tar.gz | (
  prev=''
  while read pathname; do
    grep_and_delete "$prev"
    prev="$pathname"
  done
  grep_and_delete "$prev"
)

回答by Op De Cirkel

tar -tf test.tar.gz | grep -v '/$'| \
xargs -n 1 -I _ \
sh -c 'tar -xOf test.tar.gz _|grep -q <YOUR SEARCH PATTERN>  && echo _'

回答by cowboydan

Try:

尝试:

    tar tvf name_of_file |grep --regex="pattern"

The t option will test the tar file without extracting the files. The v is verbose and the f prints he filenames. This should save you considerable hard disk space.

t 选项将在不提取文件的情况下测试 tar 文件。v 是冗长的,f 打印文件名。这应该可以为您节省大量的硬盘空间。