Linux 根据文件内容和模式匹配拆分文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8272017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 02:34:07  来源:igfitidea点击:

Split files based on file content and pattern matching

linuxperlbashpattern-matching

提问by Dean

I need your help with formate a txt file using bash/linux. The file looks like the following, it always has a line called Rate: Sth then it follows with the details in the very specific format. I'd like to split the file up with one rate for each file. In this example, I'd like to have 3 file, and each has the corresponding line says what the Rate value was.

我需要你的帮助来使用 bash/linux 格式化一个 txt 文件。该文件如下所示,它总是有一行名为 Rate: Sth ,然后是非常特定格式的详细信息。我想将文件拆分为每个文件的一个比率。在这个例子中,我想要 3 个文件,每个文件都有相应的行说明 Rate 值是什么。

How will you approach this?

你将如何处理这个问题?

line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

回答by sehe

I'd do this in perl:

我会在 perl 中这样做:

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">") or die "oops";
        next;
    }

    print $out $_
}

Use it like

使用它就像

perl ./test.pl input.txt

回答by choroba

Another solution: It just makes your input file into a script and then runs it:

另一个解决方案:它只是将您的输入文件变成一个脚本,然后运行它:

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

I assumed the line numbers are notpart of the file.

我假设行号不是文件的一部分。

回答by jaypal singh

You can use something like this in perl -

你可以在 perl 中使用这样的东西 -

Perl Script:

Perl 脚本:

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=Rate)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

Execution:

执行:

[jaypal~/temp]$ ./spl.pl temp.file

[jaypal~/temp]$ **cat temp.file**
Line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

[jaypal~/temp]$ cat temp1
Line No. Main Text
1    

[jaypal~/temp]$ cat temp2
Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated

211  

[jaypal~/temp]$ cat temp3
Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated

1002 [jaypal~/temp]$ cat temp4
Rate: USD
1003 21/11/11,-0.004419534,Validated
[jaypal~/temp]$ 

回答by Zsolt Botykai

(g)awkto the rescue:

(g)awk救援:

awk '/^Rate:/ {output_file_name=; getline } 
     { print 
... DATA ...
Rate: GBP
Rate: CHF
... DATA ...
>> ( output_file_name ) }' INPUT_FILE

The first rule and command executes for the lines that starts with Rate:and only sets the output file name, then gets the next line from the input file. Then this next line is processed and gets written to the output file. After that the next line is processed by only the second command (gets written to the output file), but only if it not matches Rate:.

第一条规则和命令针对以 开头的行执行,Rate:并且只设置输出文件名,然后从输入文件中获取下一行。然后处理下一行并写入输出文件。之后下一行仅由第二个命令处理(写入输出文件),但前提是它不匹配Rate:.

NOTE:The above solution might fail if there is a section in the input file with two continuous lines of Rate:s, like this:

注意:如果输入文件中有一个部分包含两条连续的Rate:s行,则上述解决方案可能会失败,如下所示:

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

should do (assuming that the line numbers are not part of the original file).

应该这样做(假设行号不是原始文件的一部分)。

HTH

HTH

回答by TLP

A one-liner inspired by sehe's answer:

受 sehe 回答启发的单线:

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

The -poption will read a line and print it after the code in -eis evaluated. selectwill choose a default filehandle for print. So, basically, what we are doing is simply juggling the filehandle around, depending on which Rate is currently the active one.

-p选项将读取一行并在-e评估输入的代码后打印它。select将为print. 所以,基本上,我们所做的只是简单地处理文件句柄,具体取决于当前哪个 Rate 是活动的。

Here's the code deparsed:

这是分解的代码:

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

回答by potong

This might work for you:

这可能对你有用:

sed -i '/Rate/!d' temp*.txt

This will produce files temp00.txt, temp01.txt...

这将生成文件 temp00.txt、temp01.txt...

If you only want the Rateline then;

如果你只想要那条Rate线;

##代码##