string 匹配字符串数组以在 perl 中搜索的最简单方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3019708/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Simplest way to match array of strings to search in perl?
提问by Ben Dauphinee
What I want to do is check an array of strings against my search string and get the corresponding key so I can store it. Is there a magical way of doing this with Perl, or am I doomed to using a loop? If so, what is the most efficient way to do this?
我想要做的是根据我的搜索字符串检查一个字符串数组并获取相应的键,以便我可以存储它。用 Perl 有没有一种神奇的方法可以做到这一点,还是我注定要使用循环?如果是这样,最有效的方法是什么?
I'm relatively new to Perl (I've only written 2 other scripts), so I don't know a lot of the magic yet, just that Perl is magic =D
我对 Perl 比较陌生(我只写了另外两个脚本),所以我还不知道很多魔法,只是 Perl 是魔法 =D
Reference Array: (1 = 'Canon', 2 = 'HP', 3 = 'Sony')
Search String: Sony's Cyber-shot DSC-S600
End Result: 3
回答by DVK
UPDATE:
更新:
Based on the results of discussion in this question, depending on your intent/criteria of what constitutes "not using a loop", the map
based solution below (see "Option #1) may be the most concise solution, provided that you don't consider map
a loop (the short version of the answers is: it's a loop as far as implementation/performance, it's not a loop from language theoretical point of view).
根据此问题中的讨论结果,根据您对“不使用循环”构成的意图/标准,map
下面的基于解决方案(请参阅“选项 #1”)可能是最简洁的解决方案,前提是您没有考虑map
一个循环(答案的简短版本是:就实现/性能而言,它是一个循环,从语言理论的角度来看,它不是一个循环)。
Assuming you don't care whether you get "3" or "Sony" as the answer, you can do it without a loop in a simple case, by building a regular expression with "or" logic (|
) from the array, like this:
假设你不关心你得到的是“3”还是“Sony”作为答案,你可以在一个简单的例子中不用循环,通过|
从数组中构建一个带有“或”逻辑()的正则表达式,就像这样:
my @strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",@strings);
my @which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
Result from my test run: Sony
我的测试运行结果: Sony
The regular expression will (once the variable $combined_search
is interpolated by Perl) take the form /(Canon|HP|Sony)/
which is what you want.
正则表达式将(一旦变量$combined_search
被 Perl 插入)采用/(Canon|HP|Sony)/
您想要的形式。
This will NOT work as-is if any of the strings contain regex special characters (such as |
or )
) - in that case you need to escape them
如果任何字符串包含正则表达式特殊字符(例如|
或)
),这将无法按原样工作- 在这种情况下,您需要对它们进行转义
NOTE: I personally consider this somewhat cheating, because in order to implement join()
, Perl itself must do a loop somewhere inside the interpeter. So this answer may not satisfy your desire to remain loop-less, depending on whether you wanted to avoid a loop for performance considerations, of to have cleaner or shorter code.
注意:我个人认为这有点作弊,因为为了实现join()
,Perl 本身必须在 interpeter 的某个地方做一个循环。因此,这个答案可能无法满足您保持无循环的愿望,这取决于您是否出于性能考虑而想要避免循环,或者拥有更清晰或更短的代码。
P.S. To get "3" instead of "Sony", you will have to use a loop - either in an obvious way, by doing 1 match in a loop underneath it all; or by using a library that saves you from writing the loop yourself but will have a loop underneath the call.
PS要获得“3”而不是“Sony”,您必须使用循环 - 以一种明显的方式,通过在它下面的循环中进行 1 个匹配;或者使用一个库来避免你自己编写循环,但在调用下面会有一个循环。
I will provide 3 alternative solutions.
我将提供 3 种替代解决方案。
#1 option:- my favorite. Uses "map", which I personally still consider a loop:
#1 选项:- 我最喜欢的。使用“地图”,我个人仍然认为这是一个循环:
my @strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",@strings);
my @which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
die "Not found" unless @which_found;
my $strings_index = 0;
my %strings_indexes = map {$_ => $strings_index++} @strings;
my $index = 1 + $strings_indexes{ $which_found[0] };
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#2 option: Uses a loop hidden behind a nice CPAN library method:
#2 选项:使用隐藏在一个不错的 CPAN 库方法后面的循环:
use List::MoreUtils qw(firstidx);
my @strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",@strings);
my @which_found = ($search_in =~ /($combined_search)/);
die "Not Found!"; unless @which_found;
print "$which_found[0]\n";
my $index_of_found = 1 + firstidx { $_ eq $which_found[0] } @strings;
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#3 option:Here's the obvious loop way:
#3 选项:这是明显的循环方式:
my $found_index = -1;
my @strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
foreach my $index (0..$#strings) {
next if $search_in !~ /$strings[$index]/;
$found_index = $index;
last; # quit the loop early, which is why I didn't use "map" here
}
# Check $found_index against -1; and if you want "3" instead of "2" add 1.
回答by Eric Strom
Here is a solution that builds a regular expression with embedded code to increment the index as perl moves through the regex:
这是一个使用嵌入式代码构建正则表达式的解决方案,以随着 perl 在正则表达式中移动而增加索引:
my @brands = qw( Canon HP Sony );
my $string = "Sony's Cyber-shot DSC-S600";
use re 'eval'; # needed to use the (?{ code }) construct
my $index = -1;
my $regex = join '|' => map "(?{ $index++ })\Q$_" => @brands;
print "index: $index\n" if $string =~ $regex;
# prints 2 (since Perl's array indexing starts with 0)
The string that is prepended to each brand first increments the index, and then tries to match the brand (escaped with quotemeta
(as \Q
) to allow for regex special characters in the brand names).
附加到每个品牌的字符串首先增加索引,然后尝试匹配品牌(用quotemeta
(as \Q
)转义以允许品牌名称中的正则表达式特殊字符)。
When the match fails, the regex engine moves past the alternation |
and then the pattern repeats.
当匹配失败时,正则表达式引擎移过交替|
,然后模式重复。
If you have multiple strings to match against, be sure to reset $index
before each. Or you can prepend (?{$index = -1})
to the regex string.
如果您有多个要匹配的字符串,请务必$index
在每个字符串之前重置。或者您可以添加(?{$index = -1})
到正则表达式字符串。
回答by Kavet Kerek
An easy way is just to use a hash and regex:
一个简单的方法是使用散列和正则表达式:
my $search = "your search string";
my %translation = (
'canon' => 1,
'hp' => 2,
'sony' => 3
);
for my $key ( keys %translation ) {
if ( $search =~ /$key/i ) {
return $translation{$key};
)
}
Naturally the return can just as easily be a print. You can also surround the entire thing in a while loop with:
当然,返回也可以很容易地打印出来。您还可以在 while 循环中将整个事物包围起来:
while(my $search = <>) {
#your $search is declared = to <> and now gets its values from STDIN or strings piped to this script
}
Please also take a look at perl's regex features at perlreand take a look at perl's data structures at perlref
也请看看Perl的正则表达式功能在perlre和看看Perl的数据结构在perlref
EDIT
编辑
as was just pointed out to me you were trying to steer away from using a loop. Another method would be to use perl's map function. Take a look here.
正如刚刚向我指出的那样,您试图避免使用循环。另一种方法是使用 perl 的 map 函数。看看这里。
回答by Dave Sherohman
You can also take a look at Regexp::Assemble, which will take a collection of sub-regexes and build a single super-regex from them that can then be used to test for all of them at once (and gives you the text which matched the regex, of course). I'm not sure that it's the best solution if you're only looking at three strings/regexes that you want to match, but it's definitely the way to go if you have a substantially larger target set - the project I initially used it on has a library of some 1500 terms that it's matching against and it performs very well.
您还可以查看Regexp::Assemble,它将采用子正则表达式的集合并从中构建一个单一的超级正则表达式,然后可以用于一次测试所有这些(并为您提供文本匹配正则表达式,当然)。如果您只查看要匹配的三个字符串/正则表达式,我不确定它是最佳解决方案,但如果您有一个更大的目标集,这绝对是要走的路 - 我最初使用它的项目有一个包含大约 1500 个术语的库,它与之匹配并且性能非常好。