string 在 Perl 中,如何将整个文件读入字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/953707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:27:35  来源:igfitidea点击:

In Perl, how can I read an entire file into a string?

stringperlslurp

提问by goddamnyouryan

I'm trying to open an .html file as one big long string. This is what I've got:

我试图打开一个 .html 文件作为一个大的长字符串。这就是我所拥有的:

open(FILE, 'index.html') or die "Can't read file 'filename' [$!]\n";  
$document = <FILE>; 
close (FILE);  
print $document;

which results in:

这导致:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN

However, I want the result to look like:

但是,我希望结果如下所示:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This way I can search the entire document more easily.

这样我可以更轻松地搜索整个文档。

采纳答案by Sinan ünür

Add:

添加:

 local $/;

before reading from the file handle. See How can I read in an entire file all at once?, or

在从文件句柄读取之前。请参阅如何一次读取整个文件?, 或者

$ perldoc -q "entire file"

See Variables related to filehandlesin perldoc perlvarand perldoc -f local.

请参阅相关的文件句柄的变量perldoc perlvarperldoc -f local

Incidentally, if you can put your script on the server, you can have all the modules you want. See How do I keep my own module/library directory?.

顺便说一句,如果你能把你的脚本放在服务器上,你就可以拥有你想要的所有模块。请参阅如何保留我自己的模块/库目录?.

In addition, Path::Class::Fileallows you to slurpand spew.

此外,Path::Class::File允许您slurpspew

Path::Tinygives even more convenience methods such as slurp, slurp_raw, slurp_utf8as well as their spewcounterparts.

Path::Tiny提供了更多方便的方法,例如slurp, slurp_rawslurp_utf8以及它们的spew对应方法。

回答by Chas. Owens

I would do it like this:

我会这样做:

my $file = "index.html";
my $document = do {
    local $/ = undef;
    open my $fh, "<", $file
        or die "could not open $file: $!";
    <$fh>;
};

Note the use of the three-argument version of open. It is much safer than the old two- (or one-) argument versions. Also note the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.

注意 open 的三参数版本的使用。它比旧的两个(或一个)参数版本安全得多。还要注意词法文件句柄的使用。出于多种原因,词法文件句柄比旧的裸字变体更好。我们在这里利用了其中之一:它们在超出范围时关闭。

回答by Quentin

With File::Slurp:

使用File::Slurp

use File::Slurp;
my $text = read_file('index.html');

Yes, even you can use CPAN.

是的,即使您可以使用 CPAN

回答by jrockway

All the posts are slightly non-idiomatic. The idiom is:

所有的帖子都有些不习惯。成语是:

open my $fh, '<', $filename or die "error opening $filename: $!";
my $data = do { local $/; <$fh> };

Mostly, there is no need to set $/ to undef.

大多数情况下,不需要将 $/ 设置为undef.

回答by brian d foy

From perlfaq5: How can I read in an entire file all at once?:

来自perlfaq5:如何一次读取整个文件?



You can use the File::Slurp module to do it in one step.

您可以使用 File::Slurp 模块一步完成。

use File::Slurp;

$all_of_it = read_file($filename); # entire file in scalar
@all_lines = read_file($filename); # one line per element

The customary Perl approach for processing all the lines in a file is to do so one line at a time:

处理文件中所有行的习惯 Perl 方法是一次处理一行:

open (INPUT, $file)     || die "can't open $file: $!";
while (<INPUT>) {
    chomp;
    # do something with $_
    }
close(INPUT)            || die "can't close $file: $!";

This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:

这比将整个文件作为行数组读入内存然后一次处理一个元素要高效得多,这通常是 - 如果不是几乎总是 - 错误的方法。每当你看到有人这样做时:

@lines = <INPUT>;

you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.

您应该仔细思考为什么需要一次加载所有内容。它只是不是一个可扩展的解决方案。您可能还会发现使用标准 Tie::File 模块或 DB_File 模块的 $DB_RECNO 绑定更有趣,它允许您将数组绑定到文件,以便访问数组元素实际上访问文件中的相应行.

You can read the entire filehandle contents into a scalar.

您可以将整个文件句柄内容读入一个标量。

{
local(*INPUT, $/);
open (INPUT, $file)     || die "can't open $file: $!";
$var = <INPUT>;
}

That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:

这暂时取消了您的记录分隔符,并将在块退出时自动关闭文件。如果文件已经打开,只需使用以下命令:

$var = do { local $/; <INPUT> };

For ordinary files you can also use the read function.

对于普通文件,您还可以使用读取功能。

read( INPUT, $var, -s INPUT );

The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.

第三个参数测试 INPUT 文件句柄上数据的字节大小,并将那么多字节读入缓冲区 $var。

回答by kixx

Either set $/to undef(see jrockway's answer) or just concatenate all the file's lines:

要么设置$/undef(参见 jrockway 的答案),要么只是连接所有文件的行:

$content = join('', <$fh>);

It's recommended to use scalars for filehandles on any Perl version that supports it.

建议在任何支持它的 Perl 版本上使用标量作为文件句柄。

回答by kixx

A simple way is:

一个简单的方法是:

while (<FILE>) { $document .= $_ }

Another way is to change the input record separator "$/". You can do it locally in a bare block to avoid changing the global record separator.

另一种方法是更改​​输入记录分隔符“$/”。您可以在裸块中本地执行此操作,以避免更改全局记录分隔符。

{
    open(F, "filename");
    local $/ = undef;
    $d = <F>;
}

回答by echo

Another possible way:

另一种可能的方式:

open my $fh, '<', "filename";
read $fh, my $string, -s $fh;
close $fh;

回答by Nathan

You're only getting the first line from the diamond operator <FILE>because you're evaluating it in scalar context:

您只从菱形运算符获得第一行,<FILE>因为您是在标量上下文中对其进行评估:

$document = <FILE>; 

In list/array context, the diamond operator will return all the lines of the file.

在列表/数组上下文中,菱形运算符将返回文件的所有行。

@lines = <FILE>;
print @lines;

回答by jaw

This is more of a suggestion on how NOTto do it. I've just had a bad time finding a bug in a rather big Perl application. Most of the modules had its own configuration files. To read the configuration files as-a-whole, I found this single line of Perl somewhere on the Internet:

这更多是关于如何这样做的建议。我刚刚在一个相当大的 Perl 应用程序中发现了一个错误。大多数模块都有自己的配置文件。为了从整体上读取配置文件,我在 Internet 上的某处找到了 Perl 的这一行:

# Bad! Don't do that!
my $content = do{local(@ARGV,$/)=$filename;<>};

It reassigns the line separator as explained before. But it also reassigns the STDIN.

如前所述,它会重新分配行分隔符。但它也会重新分配 STDIN。

This had at least one side effect that cost me hours to find: It does not close the implicit file handle properly (since it does not call closeat all).

这至少有一个副作用,我花了几个小时才找到:它没有正确关闭隐式文件句柄(因为它根本不调用close)。

For example, doing that:

例如,这样做:

use strict;
use warnings;

my $filename = 'some-file.txt';

my $content = do{local(@ARGV,$/)=$filename;<>};
my $content2 = do{local(@ARGV,$/)=$filename;<>};
my $content3 = do{local(@ARGV,$/)=$filename;<>};

print "After reading a file 3 times redirecting to STDIN: $.\n";

open (FILE, "<", $filename) or die $!;

print "After opening a file using dedicated file handle: $.\n";

while (<FILE>) {
    print "read line: $.\n";
}

print "before close: $.\n";
close FILE;
print "after close: $.\n";

results in:

结果是:

After reading a file 3 times redirecting to STDIN: 3
After opening a file using dedicated file handle: 3
read line: 1
read line: 2
(...)
read line: 46
before close: 46
after close: 0

The strange thing is, that the line counter $.is increased for every file by one. It's not reset, and it does not contain the number of lines. And it is not reset to zero when opening another file until at least one line is read. In my case, I was doing something like this:

奇怪的是,$.每个文件的行计数器都会增加一。它没有重置,也不包含行数。打开另一个文件时,它不会重置为零,直到至少读取一行。就我而言,我正在做这样的事情:

while($. < $skipLines) {<FILE>};

Because of this problem, the condition was false because the line counter was not reset properly. I don't know if this is a bug or simply wrong code... Also calling close;oder close STDIN;does not help.

由于此问题,条件为假,因为线路计数器未正确重置。我不知道这是一个错误还是只是错误的代码...调用close;oderclose STDIN;也无济于事。

I replaced this unreadable code by using open, string concatenation and close. However, the solution posted by Brad Gilbert also works since it uses an explicit file handle instead.

我使用打开、字符串连接和关闭替换了这个不可读的代码。但是,Brad Gilbert 发布的解决方案也有效,因为它使用显式文件句柄代替。

The three lines at the beginning can be replaced by:

开头的三行可以替换为:

my $content = do{local $/; open(my $f1, '<', $filename) or die $!; my $tmp1 = <$f1>; close $f1 or die $!; $tmp1};
my $content2 = do{local $/; open(my $f2, '<', $filename) or die $!; my $tmp2 = <$f2>; close $f2 or die $!; $tmp2};
my $content3 = do{local $/; open(my $f3, '<', $filename) or die $!; my $tmp3 = <$f3>; close $f3 or die $!; $tmp3};

which properly closes the file handle.

正确关闭文件句柄。