PHP 分解字符串,但将引号中的单词视为单个单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2202435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP explode the string, but treat words in quotes as a single word
提问by timofey
How can I explode the following string:
如何分解以下字符串:
Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor
into
进入
array("Lorem", "ipsum", "dolor sit amet", "consectetur", "adipiscing elit", "dolor")
So that the text in quotation is treated as a single word.
以便将引号中的文本视为单个单词。
Here's what I have for now:
这是我现在所拥有的:
$mytext = "Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor"
$noquotes = str_replace("%22", "", $mytext");
$newarray = explode(" ", $noquotes);
but my code divides each word into an array. How do I make words inside quotation marks treated as one word?
但我的代码将每个单词分成一个数组。如何将引号内的单词视为一个单词?
回答by Bart Kiers
You could use a preg_match_all(...):
你可以使用一个preg_match_all(...):
$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \"elit" dolor';
preg_match_all('/"(?:\\.|[^\\"])*"|\S+/', $text, $matches);
print_r($matches);
which will produce:
这将产生:
Array
(
[0] => Array
(
[0] => Lorem
[1] => ipsum
[2] => "dolor sit amet"
[3] => consectetur
[4] => "adipiscing \"elit"
[5] => dolor
)
)
And as you can see, it also accounts for escaped quotes inside quoted strings.
正如您所看到的,它还解释了引号字符串中的转义引号。
EDIT
编辑
A short explanation:
一个简短的解释:
" # match the character '"'
(?: # start non-capture group 1
\ # match the character '\'
. # match any character except line breaks
| # OR
[^\"] # match any character except '\' and '"'
)* # end non-capture group 1 and repeat it zero or more times
" # match the character '"'
| # OR
\S+ # match a non-whitespace character: [^\s] and repeat it one or more times
And in case of matching %22instead of double quotes, you'd do:
如果匹配%22而不是双引号,你会这样做:
preg_match_all('/%22(?:\\.|(?!%22).)*%22|\S+/', $text, $matches);
回答by Petah
This would have been much easier with str_getcsv().
使用str_getcsv().
$test = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor';
var_dump(str_getcsv($test, ' '));
Gives you
给你
array(6) {
[0]=>
string(5) "Lorem"
[1]=>
string(5) "ipsum"
[2]=>
string(14) "dolor sit amet"
[3]=>
string(11) "consectetur"
[4]=>
string(15) "adipiscing elit"
[5]=>
string(5) "dolor"
}
回答by Nikz
You can also try this multiple explode function
你也可以试试这个多重爆炸功能
function multiexplode ($delimiters,$string)
{
$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return $launch;
}
$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);
print_r($exploded);
回答by starbeamrainbowlabs
I came here with a complex string splitting problem similar to this, but none of the answers here did exactly what I wanted - so I wrote my own.
我带着一个与此类似的复杂字符串拆分问题来到这里,但这里的答案都没有完全符合我的要求 - 所以我写了自己的。
I am posting it here just in case it is helpful to someone else.
我把它贴在这里以防万一对其他人有帮助。
This is probably a very slow and inefficient way to do it - but it works for me.
这可能是一种非常缓慢且低效的方法 - 但它对我有用。
function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
$chars = str_split($str);
$parts = [];
$nextpart = "";
$toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
$depth = 0;
foreach($chars as $char)
{
if(in_array($char, $openers))
$depth++;
elseif(in_array($char, $closers))
$depth--;
elseif(in_array($char, $togglers))
{
if($toggle_states[$char])
$depth--; // we are inside a toggle block, leave it and decrease the depth
else
// we are outside a toggle block, enter it and increase the depth
$depth++;
// invert the toggle block state
$toggle_states[$char] = !$toggle_states[$char];
}
else
$nextpart .= $char;
if($depth < 0) $depth = 0;
if(in_array($char, $delimiters) &&
$depth == 0 &&
!in_array($char, $closers))
{
$parts[] = substr($nextpart, 0, -1);
$nextpart = "";
}
}
if(strlen($nextpart) > 0)
$parts[] = $nextpart;
return $parts;
}
Usage is as follows. explode_advtakes 5 arguments:
用法如下。explode_adv需要 5 个参数:
- An array of characters that open a block - e.g.
[,(, etc. - An array of characters that close a block - e.g.
],), etc. - An array of characters that toggle a block - e.g.
",', etc. - An array of characters that should cause a split into the next part.
- The string to work on.
- 例如-可打开的一个块的字符数组
[,(等 - 例如-即关闭块字符数组
],)等 - 例如-即切换的块的字符数组
",'等 - 应该导致拆分为下一部分的字符数组。
- 要处理的字符串。
This method probably has flaws - edits are welcome.
这种方法可能有缺陷 - 欢迎编辑。
回答by cleong
In some situations the little known token_get_all()might prove useful:
在某些情况下,鲜为人知的token_get_all()可能会被证明是有用的:
$tokens = token_get_all("<?php $text ?>");
$separator = ' ';
$items = array();
$item = "";
$last = count($tokens) - 1;
foreach($tokens as $index => $token) {
if($index != 0 && $index != $last) {
if(count($token) == 3) {
if($token[0] == T_CONSTANT_ENCAPSED_STRING) {
$token = substr($token[1], 1, -1);
} else {
$token = $token[1];
}
}
if($token == $separator) {
$items[] = $item;
$item = "";
} else {
$item .= $token;
}
}
}
Results:
结果:
Array
(
[0] => Lorem
[1] => ipsum
[2] => dolor sit amet
[3] => consectetur
[4] => adipiscing elit
[5] => dolor
)

