bash 根据从文本文件中获取的模式将文本文件拆分为多个部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9476018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split text file into parts based on a pattern taken from the text file
提问by a different ben
I have many text files of fixed-width data, e.g.:
我有许多固定宽度数据的文本文件,例如:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.07826 -3.0
15.104348 -4.0
15.130435 -5.0
15.156522 -6.0
15.182609 -6.9999995
15.208695 -8.0
The data comprise 3 or 4 runs of a simulation, all stored in the one text file, with no separator between runs. In other words, there is no empty line or anything, e.g. if there were only 3 'records' per run it would look like this for 3 runs:
数据包括模拟的 3 或 4 次运行,全部存储在一个文本文件中,运行之间没有分隔符。换句话说,没有空行或任何东西,例如,如果每次运行只有 3 个“记录”,则 3 次运行看起来像这样:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
It's a COMSOL Multiphysics output file for those interested. Visually you can tell where the new run data begin, as the first x-value is repeated (actually the entire second line might be the same for all of them). So I need to firstly open the file and get this x-value, save it, then use it as a pattern to match with awk or csplit. I am struggling to work this out!
对于感兴趣的人,这是一个 COMSOL Multiphysics 输出文件。您可以直观地看出新运行数据的开始位置,因为第一个 x 值是重复的(实际上,整个第二行可能对所有数据都相同)。所以我需要首先打开文件并获取这个 x 值,保存它,然后将其用作模式以匹配 awk 或 csplit。我正在努力解决这个问题!
csplit will do the job:
csplit 将完成这项工作:
$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\s/ {*}
but I have to know the pattern to split on. This question is similar but each of my text files might have a different pattern to match: Split files based on file content and pattern matching.
但我必须知道要拆分的模式。这个问题很相似,但我的每个文本文件可能都有不同的匹配模式:Split files based on file content and pattern matching。
Ben.
本。
采纳答案by Jim Garrison
Here's a simple awk script that will do what you want:
这是一个简单的 awk 脚本,可以执行您想要的操作:
BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim= }
== delim {
f=sprintf("test%02d.txt",fn++);
print "Creating " f
}
{ print rm -f temp*.txt
cat > f1.txt <<EOF
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
EOF
first=`awk 'NR==2{print }' f1.txt|sed 's/\./\\./'`
echo --- Splitting by: $first
csplit -z -f temp -b %02d.txt f1.txt /^"$first"\s/ {*}
for i in temp*.txt; do
echo ---- $i
cat $i
done
> f }
- initialize output file number
- ignore the first line
- extract the delimiter from the second line
- for every input line whose first token matches the delimiter, set up the output file name
- for all lines, write to the current output file
- 初始化输出文件号
- 忽略第一行
- 从第二行中提取分隔符
- 对于第一个标记与分隔符匹配的每个输入行,设置输出文件名
- 对于所有行,写入当前输出文件
回答by icyrock.com
This should do the job - test somewhere you don't have a lot of temp*.txtfiles: :)
这应该可以完成工作 - 在没有很多temp*.txt文件的地方测试::)
--- Splitting by: 15\.0
51
153
153
136
---- temp00.txt
% x y
---- temp01.txt
15.0 0.0
15.026087 -1.0
15.052174 -2.0
---- temp02.txt
15.0 0.0
15.038486 -1.0
15.066712 -2.0
---- temp03.txt
15.0 0.0
15.041089 -1.0
15.087612 -2.0
The output of the above is:
上面的输出是:
cat your_file.txt | grep -P "^\d" | \
split --lines=$(expr \( $(wc -l "your_file.txt" | \
awk '{print '}) - 1 \) / number_of_runs)
Of course, you will run into trouble if you have repeating second column value (15.0in the above example) - solving that would be a tad harder - exercise left for the reader...
当然,如果你有重复的第二列值(15.0在上面的例子中),你会遇到麻烦- 解决这个问题会有点困难 - 留给读者的练习......
回答by Blackle Mori
If the amount of lines per run is constant, you could use this:
如果每次运行的行数是恒定的,你可以使用这个:
##代码##
