string 如何在 Perl 中提取两个行分隔符之间的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1212799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I extract lines between two line delimiters in Perl?
提问by jbatista
I have an ASCII log file with some content I would like to extract. I've never taken time to learn Perl properly, but I figure this is a good tool for this task.
我有一个 ASCII 日志文件,其中包含我想提取的一些内容。我从来没有花时间正确学习 Perl,但我认为这是完成这项任务的好工具。
The file is structured like this:
该文件的结构如下:
... ... some garbage ... ... garbage START what i want is on different lines END ... ... more garbage ... next one START more stuff I want, again spread through multiple lines END ... more garbage
So, I'm looking for a way to extract the lines between each START
and END
delimiter strings.
How can I do this?
所以,我正在寻找一种方法来提取每个字符串START
和END
分隔符字符串之间的行。我怎样才能做到这一点?
So far, I've only found some examples on how to print a line with the START
string, or other documentation items that are somewhat related with what I'm looking for.
到目前为止,我只找到了一些关于如何使用START
字符串打印一行的示例,或与我正在查找的内容有些相关的其他文档项。
回答by Telemachus
You want the flip-flop operator (better known as the range operator) ..
您需要触发器运算符(通常称为范围运算符) ..
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
if (/START/../END/) {
next if /START/ || /END/;
print;
}
}
Replace the call to print
with whatever you actually want to do (e.g., push the line into an array, edit it, format it, whatever). I'm next
-ing past the lines that actually have START
or END
, but you may not want that behavior. See this articlefor a discussion of this operator and other useful Perl special variables.
将调用替换为print
您真正想做的事情(例如,将行推入数组,对其进行编辑、格式化,等等)。我正在next
跳过实际具有START
或的行END
,但您可能不希望这种行为。有关此运算符和其他有用的 Perl 特殊变量的讨论,请参阅本文。
回答by brian d foy
From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?
从perlfaq6对如何在两个本身在不同行上的模式之间拉出线的回答?
You can use Perl's somewhat exotic .. operator (documented in perlop):
您可以使用 Perl 的有点奇特的 .. 运算符(记录在 perlop 中):
perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
如果你想要文本而不是线条,你可以使用
perl -0777 -ne 'print "\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.
但是,如果您想要 START 到 END 的嵌套出现,您将遇到本节中关于匹配平衡文本的问题中描述的问题。
Here's another example of using ..:
这是使用 .. 的另一个示例:
while (<>) {
$in_header = 1 .. /^$/;
$in_body = /^$/ .. eof;
# now choose between them
} continue {
$. = 0 if eof; # fix $.
}
回答by dala
Not too bad for coming from a "virtual newcommer". One thing you could do, is to put the "$found=1" inside of the "if($found == 0)" block so that you don't do that assignment every time between $start and $stop.
来自“虚拟新人”还不错。您可以做的一件事是将“$found=1”放在“if($found == 0)”块中,这样您就不会每次在$start 和$stop 之间都执行该分配。
Another thing that is a bit ugly, in my opinion, is that you open the same filehandler each time you enter the $start/$stop-block.
在我看来,另一件有点难看的事情是每次输入 $start/$stop-block 时都打开同一个文件处理程序。
This shows a way around that:
这显示了一种解决方法:
#!/usr/bin/perl
use strict;
use warnings;
my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;
while (<>) {
# Find block of lines to extract
if( /$start/../$stop/ ) {
# Start of block
if( /$start/ ) {
$filename=sprintf("boletim_%06d.log",$counter);
open($output,'>>'.$filename) or die $!;
}
# End of block
elsif ( /$end/ ) {
close($output);
$counter++;
$found = 0;
}
# Middle of block
else{
if($found == 0) {
print $output (split(/ /))[1];
$found=1;
}
else {
print $output $_;
}
}
}
# Find block of lines to extract
}
回答by Dirk
How can I grab multiple lines after a matching line in Perl?
How's that one? In that one, the END string is $^, you can change it to your END string.
那个怎么样?在那个中,END 字符串是 $^,您可以将其更改为您的 END 字符串。
I am also a novice, but the solutions there provide quite a few methods... let me know more specifically what it is you want that differs from the above link.
我也是新手,但是那里的解决方案提供了很多方法......让我更具体地说明你想要什么与上面的链接不同。
回答by ghostdog74
while (<>) {
chomp; # strip record separator
if(/END/) { $f=0;}
if (/START/) {
s/.*START//g;
$f=1;
}
print $_ ."\n" if $f;
}
try to write some code next time round
下次尝试写一些代码
回答by jbatista
After Telemachus' reply, things started pouring out. This works as the solution I'm looking at after all.
在泰勒马科斯的回答之后,事情开始涌现。毕竟,这可以作为我正在寻找的解决方案。
- I'm trying to extract lines delimited by two strings (one, with a line ending with "CINFILE="; other, with a line containing a single "#") in separate lines, excluding the delimiter lines. This I can do with Telemachus' solution.
- The first line has a space I want to remove. I'm also including it.
- I'm also trying to extract each line-set into separate files.
- 我试图在不同的行中提取由两个字符串分隔的行(一个,一行以“CINFILE=”结尾;另一个,一行包含一个“#”),不包括分隔符行。我可以用 Telemachus 的解决方案来做到这一点。
- 第一行有一个我想删除的空格。我也包括在内。
- 我还试图将每个行集提取到单独的文件中。
This works for me, although the code can be classified as ugly; this is because I'm currently a virtually newcomer to Perl. Anyway here goes:
这对我有用,虽然代码可以归类为丑陋的;这是因为我目前几乎是 Perl 的新手。无论如何,这里是:
#!/usr/bin/env perl
use strict;
use warnings;
my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;
while (<>) {
if (/$start/../$stop/) {
$filename=sprintf("boletim_%06d.log",$counter);
open($output,'>>'.$filename) or die $!;
next if /$start/ || /$stop/;
if($found == 0) { print $output (split(/ /))[1]; }
else { print $output $_; }
$found=1;
} else { if($found == 1) { close($output); $counter++; $found=0; } }
}
I hope it benefits others as well. Cheers.
我希望它也有益于其他人。干杯。