bash 根据从文本文件中获取的模式将文本文件拆分为多个部分

Question

提问by a different ben

I have many text files of fixed-width data, e.g.:

我有许多固定宽度数据的文本文件，例如：

$ head model-q-060.txt 
% x                      y                        
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
15.07826                 -3.0                     
15.104348                -4.0                     
15.130435                -5.0                     
15.156522                -6.0                     
15.182609                -6.9999995               
15.208695                -8.0

The data comprise 3 or 4 runs of a simulation, all stored in the one text file, with no separator between runs. In other words, there is no empty line or anything, e.g. if there were only 3 'records' per run it would look like this for 3 runs:

数据包括模拟的 3 或 4 次运行，全部存储在一个文本文件中，运行之间没有分隔符。换句话说，没有空行或任何东西，例如，如果每次运行只有 3 个“记录”，则 3 次运行看起来像这样：

$ head model-q-060.txt 
% x                      y                        
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
15.0                     0.0                      
15.038486                -1.0                     
15.066712                -2.0                     
15.0                     0.0                      
15.041089                -1.0                     
15.087612                -2.0

It's a COMSOL Multiphysics output file for those interested. Visually you can tell where the new run data begin, as the first x-value is repeated (actually the entire second line might be the same for all of them). So I need to firstly open the file and get this x-value, save it, then use it as a pattern to match with awk or csplit. I am struggling to work this out!

对于感兴趣的人，这是一个 COMSOL Multiphysics 输出文件。您可以直观地看出新运行数据的开始位置，因为第一个 x 值是重复的（实际上，整个第二行可能对所有数据都相同）。所以我需要首先打开文件并获取这个 x 值，保存它，然后将其用作模式以匹配 awk 或 csplit。我正在努力解决这个问题！

csplit will do the job:

csplit 将完成这项工作：

$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\s/ {*}

but I have to know the pattern to split on. This question is similar but each of my text files might have a different pattern to match: Split files based on file content and pattern matching.

但我必须知道要拆分的模式。这个问题很相似，但我的每个文本文件可能都有不同的匹配模式：Split files based on file content and pattern matching。

Ben.

本。

Answer 1

采纳答案by Jim Garrison

Here's a simple awk script that will do what you want:

这是一个简单的 awk 脚本，可以执行您想要的操作：

BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim= }
 == delim {
    f=sprintf("test%02d.txt",fn++);
    print "Creating " f
}

{ print rm -f temp*.txt

cat > f1.txt <<EOF
% x                      y                        
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
15.0                     0.0                      
15.038486                -1.0                     
15.066712                -2.0                     
15.0                     0.0                      
15.041089                -1.0                     
15.087612                -2.0    
EOF

first=`awk 'NR==2{print }' f1.txt|sed 's/\./\\./'`
echo --- Splitting by: $first

csplit -z -f temp -b %02d.txt f1.txt /^"$first"\s/ {*}

for i in temp*.txt; do
  echo ---- $i
  cat $i
done
 > f }

initialize output file number
ignore the first line
extract the delimiter from the second line
for every input line whose first token matches the delimiter, set up the output file name
for all lines, write to the current output file

初始化输出文件号
忽略第一行
从第二行中提取分隔符
对于第一个标记与分隔符匹配的每个输入行，设置输出文件名
对于所有行，写入当前输出文件

Answer 2

回答by icyrock.com

This should do the job - test somewhere you don't have a lot of temp*.txtfiles: :)

这应该可以完成工作 - 在没有很多temp*.txt文件的地方测试：:)

--- Splitting by: 15\.0
51
153
153
136
---- temp00.txt
% x                      y                        
---- temp01.txt
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
---- temp02.txt
15.0                     0.0                      
15.038486                -1.0                     
15.066712                -2.0                     
---- temp03.txt
15.0                     0.0                      
15.041089                -1.0                     
15.087612                -2.0

The output of the above is:

上面的输出是：

cat your_file.txt | grep -P "^\d" | \
   split --lines=$(expr \( $(wc -l "your_file.txt" | \
   awk '{print '}) - 1 \) / number_of_runs)

Of course, you will run into trouble if you have repeating second column value (15.0in the above example) - solving that would be a tad harder - exercise left for the reader...

当然，如果你有重复的第二列值（15.0在上面的例子中），你会遇到麻烦- 解决这个问题会有点困难 - 留给读者的练习......

Answer 3

回答by Blackle Mori

If the amount of lines per run is constant, you could use this:

如果每次运行的行数是恒定的，你可以使用这个：

##代码##

bash 根据从文本文件中获取的模式将文本文件拆分为多个部分

提问by a different ben

采纳答案by Jim Garrison

回答by icyrock.com

回答by Blackle Mori

相关推荐

最近更新

标签

bash 根据从文本文件中获取的模式将文本文件拆分为多个部分

提问by a different ben

采纳答案by Jim Garrison

回答by icyrock.com

回答by Blackle Mori

相关推荐

bash Git / 分离的 HEAD，重新开始工作？

使用 Bash 变量支持扩展 - {0..$foo}

bash 如何从zippyshare下载bash？

Bash 脚本“sed：第一个 RE 可能不为空”错误

相关推荐

最近更新

标签