php 使用php preg_match（正则表达式）将camelCase单词拆分为单词

Question

提问by Good-bye

How would I go about splitting the word:

我将如何拆分这个词：

oneTwoThreeFour

into an array so that I can get:

放入一个数组中，以便我可以得到：

one Two Three Four

with preg_match?

与preg_match？

I tired this but it just gives the whole word

我累了，但它只是给出了整个词

$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;

Answer 1

回答by codaddict

You can also use preg_match_allas:

您还可以preg_match_all用作：

preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);

Explanation:

解释：

(        - Start of capturing parenthesis.
 (?:     - Start of non-capturing parenthesis.
  ^      - Start anchor.
  |      - Alternation.
  [A-Z]  - Any one capital letter.
 )       - End of non-capturing parenthesis.
 [a-z]+  - one ore more lowercase letter.
)        - End of capturing parenthesis.

Answer 2

回答by codaddict

You can use preg_splitas:

您可以preg_split用作：

$arr = preg_split('/(?=[A-Z])/',$str);

See it

看见

I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z])matches the point just before a uppercase letter.

我基本上是在大写字母之前拆分输入字符串。使用的正则表达式(?=[A-Z])匹配大写字母之前的点。

Answer 3

回答by ridgerunner

I know that this is an old question with an accepted answer, but IMHO there is a better solution:

我知道这是一个老问题，答案已被接受，但恕我直言，有一个更好的解决方案：

<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
    # Split camelCase "words". Two global alternatives. Either g1of2:
      (?<=[a-z])      # Position is after a lowercase,
      (?=[A-Z])       # and before an uppercase letter.
    | (?<=[A-Z])      # Or g2of2; Position is after uppercase,
      (?=[A-Z][a-z])  # and before upper-then-lower case.
    /x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
    printf("Word %d of %d = \"%s\"\n",
        $i + 1, $count, $a[$i]);
}
?>

Note that this regex, (like codaddict's '/(?=[A-Z])/'solution - which works like a charm for well formed camelCase words), matches only a positionwithin the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCapand: hasConsecutiveCAPS.

请注意，此正则表达式（就像 codacci 的'/(?=[A-Z])/'解决方案 - 对于格式良好的驼峰式单词的魅力一样），仅匹配字符串中的一个位置并且根本不消耗任何文本。这个解决方案还有一个额外的好处，它也可以正确处理格式不太好的伪驼峰词，例如:StartsWithCap和: hasConsecutiveCAPS。

Input:

输入：

oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule

Output:

输出：

Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"

Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"

Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"

Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"

Edited: 2014-04-12:Modified regex, script and test data to correctly split: "NewNASAModule"case (in response to rr's comment).

编辑：2014-04-12：修改正则表达式、脚本和测试数据以正确拆分："NewNASAModule"case（响应 rr 的评论）。

Answer 4

回答by blak3r

A functionized version of @ridgerunner's answer.

@ridgerunner 答案的功能化版本。

/**
 * Converts camelCase string to have spaces between each.
 * @param $camelCaseString
 * @return string
 */
function fromCamelCase($camelCaseString) {
        $re = '/(?<=[a-z])(?=[A-Z])/x';
        $a = preg_split($re, $camelCaseString);
        return join($a, " " );
}

Answer 5

回答by rr-

While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:

虽然 ridgerunner 的答案效果很好，但它似乎不适用于出现在句子中间的全大写子字符串。我使用以下内容，似乎可以很好地处理这些问题：

function splitCamelCase($input)
{
    return preg_split(
        '/(^[^A-Z]+|[A-Z][^A-Z]+)/',
        $input,
        -1, /* no limit for replacement count */
        PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
            | PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
    );
}

Some test cases:

一些测试用例：

assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);

Answer 6

回答by ArtisticPheonix

$string = preg_replace( '/([a-z0-9])([A-Z])/', " ", $string );

The trick is a repeatable pattern $1 $2$1 $2 or lower UPPERlower UPPERlower etc.... for example helloWorld = $1 matches "hello", $2 matches "W" and $1 matches "orld" again so in short you get $1 $2$1 or "hello World", matches HelloWorld as $2$1 $2$1 or again "Hello World". Then you can lower case them uppercase the first word or explode them on the space, or use a _ or some other character to keep them separate.

诀窍是一个可重复的模式 $1 $2$1 $2 或更低的 UPPERlower UPPERlower 等等......例如 helloWorld = $1 匹配“hello”，$2 匹配“W”和 $1 再次匹配“orld”所以简而言之，你得到 $1 $2$1 或“hello World”，将 HelloWorld 匹配为 $2$1 $2$1 或再次匹配“Hello World”。然后你可以将它们小写，大写第一个单词或在空格上分解它们，或者使用 _ 或其他一些字符将它们分开。

Short and simple.

简短而简单。

Answer 7

回答by mickmackusa

When determining the best pattern for your project, you will need to consider the following pattern factors:

在为您的项目确定最佳模式时，您需要考虑以下模式因素：

Accuracy (Robustness) -- whether the pattern is correct in all cases and is reasonably future-proof
Efficiency -- the pattern should be direct, deliberate, and avoid unnecessary labor
Brevity -- the pattern should use appropriate techniques to avoid unnecessary character length
Readability -- the pattern should be keep as simple as possible

准确性（鲁棒性）——模式是否在所有情况下都是正确的并且是合理的面向未来
效率——模式应该是直接的、深思熟虑的，避免不必要的劳动
简洁——模式应该使用适当的技术来避免不必要的字符长度
可读性——模式应该尽可能简单

The above factors also happen to be in the hierarchical order that strive to obey. In other words, it doesn't make much sense to me to prioritize 2, 3, or 4 when 1 doesn't quite satisfy the requirements. Readability is at the bottom of the list for me because in most cases I can follow the syntax.

上述因素也恰好处于努力服从的等级秩序中。换句话说，当 1 不能完全满足要求时，优先考虑 2、3 或 4 对我来说没有多大意义。可读性对我来说是最重要的，因为在大多数情况下我可以遵循语法。

Capture Groups and Lookarounds often impact pattern efficiency. The truth is, unless you are executing this regex on thousands of input strings, there is no need to toil over efficiency. It is perhaps more important to focus on pattern readability which can be associated with pattern brevity.

捕获组和环视通常会影响模式效率。事实是，除非您在数以千计的输入字符串上执行此正则表达式，否则无需为效率操劳。关注与模式简洁相关的模式可读性可能更重要。

Some patterns below will require some additional handling/flagging by their preg_function, but here are some pattern comparisons based on the OP's sample input:

下面的一些模式将需要通过其preg_功能进行一些额外的处理/标记，但这里有一些基于 OP 示例输入的模式比较：

preg_split()patterns:

preg_split()图案：

/^[^A-Z]+\K|[A-Z][^A-Z]+\K/(21 steps)
/(^[^A-Z]+|[A-Z][^A-Z]+)/(26 steps)
/[^A-Z]+\K(?=[A-Z])/(43 steps)
/(?=[A-Z])/(50 steps)
/(?=[A-Z]+)/(50 steps)
/([a-z]{1})[A-Z]{1}/(53 steps)
/([a-z0-9])([A-Z])/(68 steps)
/(?<=[a-z])(?=[A-Z])/x(94 steps) ...for the record, the xis useless.
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/(134 steps)

/^[^A-Z]+\K|[A-Z][^A-Z]+\K/(21 步)
/(^[^A-Z]+|[A-Z][^A-Z]+)/(26 步)
/[^A-Z]+\K(?=[A-Z])/(43 步)
/(?=[A-Z])/（50 步）
/(?=[A-Z]+)/（50 步）
/([a-z]{1})[A-Z]{1}/(53 步)
/([a-z0-9])([A-Z])/(68 步)
/(?<=[a-z])(?=[A-Z])/x（94 步） ...为了记录，这x是没用的。
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/(134 步)

preg_match_all()patterns:

preg_match_all()图案：

/[A-Z]?[a-z]+/(14 steps)
/((?:^|[A-Z])[a-z]+)/(35 steps)

/[A-Z]?[a-z]+/(14 步)
/((?:^|[A-Z])[a-z]+)/(35 步)

I'll point out that there is a subtle difference between the output of preg_match_all()and preg_split(). preg_match_all()will output a 2-dimensional array, in other words, all of the fullstring matches will be in the [0]subarray; if there is a capture group used, those substrings will be in the [1]subarray. On the other hand, preg_split()only outputs a 1-dimensional array and therefore provides a less bloated and more direct path to the desired output.

我会指出，preg_match_all()和的输出之间存在细微差别preg_split()。 preg_match_all()将输出一个二维数组，换句话说，所有的全字符串匹配都将在[0]子数组中；如果使用了捕获组，则这些子字符串将位于[1]子数组中。另一方面，preg_split()只输出一个一维数组，因此提供了一个不那么臃肿和更直接的到达所需输出的路径。

Some of the patterns are insufficient when dealing with camelCase strings that contain an ALLCAPS/acronym substring in them. If this is a fringe case that is possible within your project, it is logical to only consider patterns that handle these cases correctly. I will not be testing TitleCase input strings because that is creeping too far from the question.

在处理包含 ALLCAPS/acronym 子字符串的驼峰字符串时，某些模式是不够的。如果这是您项目中可能出现的边缘情况，那么只考虑正确处理这些情况的模式是合乎逻辑的。我不会测试 TitleCase 输入字符串，因为这离问题太远了。

New Extended Battery of Test Strings:

新的扩展测试字符串电池：

oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain

Suitable preg_split()patterns:

合适的preg_split()图案：

/[a-z]+\K|(?=[A-Z][a-z]+)/(149 steps) *I had to use [a-z]for the demo to count properly
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/(547 steps)

/[a-z]+\K|(?=[A-Z][a-z]+)/（149 步）*我必须使用[a-z]演示才能正确计数
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/(547 步)

Suitable preg_match_all()pattern:

合适的preg_match_all()图案：

/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/(75 steps)

/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/（75 步）

Finally, my recommendations based on my pattern principles / factor hierarchy. Also, I recommend preg_split()over preg_match_all()(despite the patterns having less steps) as a matter of directness to the desired output structure. (of course, choose whatever you like)

最后，我的建议基于我的模式原则/因素层次结构。另外，我建议preg_split()过preg_match_all()（虽然具有较少的步骤的图案）作为直接的期望的输出结构的问题。（当然，你喜欢什么就选什么）

Code: (Demo)

代码：（演示）

$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);

Code: (Demo)

代码：（演示）

$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);

Answer 8

回答by Jarrod

I took cool guy Ridgerunner's code (above) and made it into a function:

我把很酷的家伙 Ridgerunner 的代码（上面）变成了一个函数：

echo deliciousCamelcase('NewNASAModule');

function deliciousCamelcase($str)
{
    $formattedStr = '';
    $re = '/
          (?<=[a-z])
          (?=[A-Z])
        | (?<=[A-Z])
          (?=[A-Z][a-z])
        /x';
    $a = preg_split($re, $str);
    $formattedStr = implode(' ', $a);
    return $formattedStr;
}

This will return: New NASA Module

这将返回： New NASA Module

Answer 9

回答by Kobi

Another option is matching /[A-Z]?[a-z]+/- if you know your input is on the right format, it should work nicely.

另一种选择是匹配/[A-Z]?[a-z]+/- 如果您知道您的输入格式正确，它应该可以很好地工作。

[A-Z]?would match an uppercase letter (or nothing). [a-z]+would then match all following lowercase letters, until the next match.

[A-Z]?将匹配一个大写字母（或什么都不匹配）。[a-z]+然后将匹配所有后面的小写字母，直到下一个匹配。

Working example: https://regex101.com/r/kNZfEI/1

工作示例：https: //regex101.com/r/kNZfEI/1

Answer 10

回答by Daniel Rhodes

You can split on a "glide" from lowercase to uppercase thus:

您可以将“滑动”从小写拆分为大写，因此：

$parts = preg_split('/([a-z]{1})[A-Z]{1}/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);        
//PREG_SPLIT_DELIM_CAPTURE to also return bracketed things
var_dump($parts);

Annoyingly you will then have to rebuild the words from each corresponding pair of items in $parts

令人讨厌的是，您将不得不从 $parts 中每个对应的项目对中重建单词

Hope this helps

希望这可以帮助

php 使用php preg_match（正则表达式）将camelCase单词拆分为单词

提问by Good-bye

回答by codaddict

回答by codaddict

回答by ridgerunner

Input:

输入：

Output:

输出：

回答by blak3r

回答by rr-

回答by ArtisticPheonix

回答by mickmackusa

回答by Jarrod

回答by Kobi

回答by Daniel Rhodes

相关推荐

最近更新

标签

php 使用php preg_match（正则表达式）将camelCase单词拆分为单词

提问by Good-bye

回答by codaddict

回答by codaddict

回答by ridgerunner

Input:

输入：

Output:

输出：

回答by blak3r

回答by rr-

回答by ArtisticPheonix

回答by mickmackusa

回答by Jarrod

回答by Kobi

回答by Daniel Rhodes

相关推荐

PHP 类：全局变量作为类中的属性

用 PHP 读取 PDF 元数据

bind_result 成一个数组 PHP mysqli 准备语句

php 在循环中创建多维数组

相关推荐

最近更新

标签