bash 根据模式将一个文件拆分为多个文件

Question

提问by jaypal singh

I have a binary file which I convert into a regular file using hexdump and few awk and sed commands. The output file looks something like this -

我有一个二进制文件，我使用 hexdump 和几个 awk 和 sed 命令将其转换为常规文件。输出文件看起来像这样 -

$cat temp
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000
000000087d3f513000000000000000000000000000000000001001001010f000000000026 
58783100b354c52658783100b43d3d0000ad6413400103231665f301010b9130194899f2f
fffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f433031000000000004
6363070000000000000000000000000065450000b4fb6b4000393d3d1116cdcc57e58287d
3f55285a1084b

The temp file has few eye catchers (3d3d) which don't repeat that often. They kinda denote a start of new binary record. I need to split the file based on those eye catchers.

临时文件很少有吸引眼球的东西（3d3d），它们不会经常重复。它们有点表示新二进制记录的开始。我需要根据那些引人注目的东西拆分文件。

My desired output is to have multiple files (based on the number of eyecatchers in my temp file).

我想要的输出是有多个文件（基于我的临时文件中的引人注目的数量）。

So my output would look something like this -

所以我的输出看起来像这样 -

$cat temp1
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e582000000000000000
0000000000087d3f513000000000000000000000000000000000001001001010f00000000
002658783100b354c52658783100b4

$cat temp2
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc0
15800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000
000000000065450000b4fb6b400039

$cat temp3
3d3d1116cdcc57e58287d3f55285a1084b

Answer 1

采纳答案by rob mayoff

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=3d3d)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

Answer 2

回答by Michael J. Barber

The RSvariable in awkis nice for this, allowing you to define the record separator. Thus, you just need to capture each record in its own temp file. The simplest version is:

该RS变量awk是这个漂亮的，允许你定义记录分隔符。因此，您只需要在其自己的临时文件中捕获每条记录。最简单的版本是：

cat temp |
  awk -v RS="3d3d" '{ print cat temp |
  awk -v RS="3d3d" 'NR > 1 { print RS # sed 's/3d3d/\n&/2g' temp | split -dl1 - temp
# ls
temp temp00  temp01  temp02
# cat temp00
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000000000087d3f513000000000000000000000000000000000001001001010f000000000026 58783100b354c52658783100b4
# cat temp01
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000000000000065450000b4fb6b400039
# cat temp02
3d3d1116cdcc57e58287d3f55285a1084b
 > "temp" (NR-1); close("temp" (NR-1)) }' 
 > "temp" NR }'

The sample text starts with the eye-catcher 3d3d, so temp1 will be an empty file. Further, the eye-catcher itself won't be at the start of the temp files, as was shown for the temp files in the question. Finally, if there are a lot of records, you could run into the system limit on open files. Some minor complications will bring it closer to what you want and make it safer:

示例文本以 eye-catcher 开头3d3d，因此 temp1 将是一个空文件。此外，引人注目的本身不会位于临时文件的开头，如问题中的临时文件所示。最后，如果有很多记录，您可能会遇到系统对打开文件的限制。一些轻微的并发症会使它更接近您想要的并使其更安全：

 sed 's/3d3d/\n&/g;s/^\n\(3d3d\)//' temp |csplit -zf temp - '/^3d3d/' {*}

Answer 3

回答by potong

This might work:

这可能有效：

filename=foo bar
foo bar line1
foo bar line2
filename=baz qux
baz qux line1
baz qux line2

EDIT:

编辑：

If there are newlines in the source file you can remove them first by using tr -d '\n' <tempand then pipe the output through the above sedcommand. If however you wish to preserve them then:

如果源文件中有换行符，您可以先使用删除它们tr -d '\n' <temp，然后通过上述sed命令管道输出。但是，如果您希望保留它们，则：

# cat: useless use of cat ^__^;
# tr: replace all newlines with delimiter1 (which must not be in concatted.txt) so we have one line of all the next
# sed: replace file start pattern with delimiter2 (which must not be in concatted.txt) so we know where to split out each file
# tr: replace delimiter2 with NULL character since sed can't do it
# xargs: split giant single-line input on NULL character and pass 1 line (= 1 file) at a time to echo into the pipe
# sed: get all but last line (same as head -n -1) because there's an extra since concatted-file.txt ends in a NULL character.
# awk: does a bunch of stuff as the final command. Remember it's getting a single line to work with.
#   {replace all delimiter1s in file with newlines (in place)}
#   {match regex (sets RSTART and RLENGTH) then set filename to regex match (might end at delimiter1). Note in this case the number 9 is the length of "filename=" and the 2 removes the "§" }
#   {write file to filename and close the file (to avoid "too many files open" error)}
cat ../concatted-file.txt \
| tr '\n' '§' \
| sed 's/filename=/?filename=/g' \
| tr '?' 'filename=foo bar
foo bar line1
foo bar line2


' \
| xargs -t -0 -n1 echo \
| sed $d \
| awk '{match(filename=baz qux
baz qux line1
baz qux line2


, /filename=[^§]+§/)} {filename=substr(sed 's/\(.\)\(3d3d\)/#/g' FILE | awk -F "#" '{ for (i=1; i++; i<=NF) { print $i > "temp" i } }' 
, RSTART+9, RLENGTH-9-2)".txt"} {gsub(/§/, "\n", awk '/^3d3d/ { i++ } { print > "temp" i }' temp
)} {print ##代码## > filename; close(filename)}'

Should do the trick

应该做的伎俩

Answer 4

回答by mLuby

Mac OS X answer

Mac OS X 答案

Where that nice awk -v RS="pattern"trick doesn't work. Here's what I got working:

那个漂亮的awk -v RS="pattern"技巧不起作用的地方。这是我的工作：

Given this example concatted.txt

鉴于这个例子 concatted.txt

##代码##

use this command (remove comments to prevent it from failing)

使用此命令（删除注释以防止其失败）

##代码##

results in these two files named foo bar.txtand baz qux.txtrespectively:

导致命名这两个文件foo bar.txt，并baz qux.txt分别为：

##代码## ##代码##

Hope this helps!

希望这可以帮助！

Answer 5

回答by Zsolt Botykai

It depends if it's a single line in your tempfile or not. But assuming if it's a single line, you can go with:

这取决于它是否是temp文件中的一行。但假设它是一行，你可以使用：

##代码##

The first sedinserts a #as a field/record separator, then awksplits on #and prints every "field" to its own file.

第一个sed插入 a#作为字段/记录分隔符，然后awk拆分#并将每个“字段”打印到自己的文件中。

If the input file is already split on 3d3dthen you can go with:

如果输入文件已经拆分，3d3d那么您可以使用：

##代码##

HTH

bash 根据模式将一个文件拆分为多个文件

提问by jaypal singh

采纳答案by rob mayoff

回答by Michael J. Barber

回答by potong

回答by mLuby

Mac OS X answer

Mac OS X 答案

回答by Zsolt Botykai

相关推荐

最近更新

标签

bash 根据模式将一个文件拆分为多个文件

提问by jaypal singh

采纳答案by rob mayoff

回答by Michael J. Barber

回答by potong

回答by mLuby

Mac OS X answer

Mac OS X 答案

回答by Zsolt Botykai

相关推荐

bash SH 脚本根据文件名将文件从一个目录移动到另一个目录

bash 在指定的时间范围内从日志文件中提取数据

Bash 中的并行 wget

Bash：逐行读取文件并将每个段作为参数处理给其他程序

相关推荐

最近更新

标签