php 通过正则表达式解析 CSS
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/236979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing CSS by regex
提问by Ross
I'm creating a CSS editor and am trying to create a regular expression that can get data from a CSS document. This regex works if I have one property but I can't get it to work for all properties. I'm using preg/perl syntax in PHP.
我正在创建一个 CSS 编辑器,并尝试创建一个可以从 CSS 文档中获取数据的正则表达式。如果我有一个属性,则此正则表达式有效,但我无法使其适用于所有属性。我在 PHP 中使用 preg/perl 语法。
Regex
正则表达式
(?<selector>[A-Za-z]+[\s]*)[\s]*{[\s]*((?<properties>[A-Za-z0-9-_]+)[\s]*:[\s]*(?<values>[A-Za-z0-9#, ]+);[\s]*)*[\s]*}
Test case
测试用例
body { background: #f00; font: 12px Arial; }
Expected Outcome
预期结果
Array(
[0] => Array(
[0] => body { background: #f00; font: 12px Arial; }
[selector] => Array(
[0] => body
)
[1] => Array(
[0] => body
)
[2] => font: 12px Arial;
[properties] => Array(
[0] => font
)
[3] => Array(
[0] => font
)
[values] => Array(
[0] => 12px Arial
[1] => background: #f00
)
[4] => Array(
[0] => 12px Arial
[1] => background: #f00
)
)
)
Real Outcome
实际结果
Array(
[0] => Array
(
[0] => body { background: #f00; font: 12px Arial; }
[selector] => body
[1] => body
[2] => font: 12px Arial;
[properties] => font
[3] => font
[values] => 12px Arial
[4] => 12px Arial
)
)
Thanks in advance for any help - this has been confusing me all afternoon!
在此先感谢您的帮助 - 这让我整个下午都感到困惑!
回答by Tanktalus
That just seems too convoluted for a single regular expression. Well, I'm sure that with the right extentions, an advanced user could create the right regex. But then you'd need an even more advanced user to debug it.
对于单个正则表达式来说,这似乎太复杂了。好吧,我确信使用正确的扩展名,高级用户可以创建正确的正则表达式。但是你需要一个更高级的用户来调试它。
Instead, I'd suggest using a regex to pull out the pieces, and then tokenising each piece separately. e.g.,
相反,我建议使用正则表达式来提取碎片,然后分别标记每个碎片。例如,
/([^{])\s*\{\s*([^}]*?)\s*}/
Then you end up with the selector and the attributes in separate fields, and then split those up. (Even the selector will be fun to parse.) Note that even this will have pains if }'s can appear inside quotes or something. You could, again, convolute the heck out of it to avoid that, but it's probably even better to avoid regex's altogether here, and handle it by parsing one field at a time, perhaps by using a recursive-descent parser or yacc/bison or whatever.
然后您最终将选择器和属性放在单独的字段中,然后将它们拆分。(即使是选择器解析起来也会很有趣。)请注意,如果} 可以出现在引号之内,即使这样也会很痛苦。再次,你可以把它搞混来避免这种情况,但最好在这里完全避免正则表达式,并通过一次解析一个字段来处理它,也许通过使用递归下降解析器或 yacc/bison 或任何。
回答by Andru Luvisi
You are trying to pull structure out of the data, and not just individual values. Regular expressions might could be painfully stretched to do the job, but you are really entering parser territory, and should be pulling out the big guns, namely parsers.
您试图从数据中提取结构,而不仅仅是单个值。正则表达式可能会被痛苦地拉伸来完成这项工作,但您确实进入了解析器领域,并且应该拔出大枪,即解析器。
I have never used the PHP parser generating tools, but they look okay after a light scan of the docs. Check out LexerGeneratorand ParserGenerator. LexerGenerator will take a bunch of regular expressions describing the different types of tokens in a language (in this case, CSS) and spit out some code that recognizes the individual tokens. ParserGenerator will take a grammar, a description of what things in a language are made up of what other things, and spit out a parser, code that takes a bunch of tokens and returns a syntax tree (the data structure that you are after.
我从未使用过 PHP 解析器生成工具,但在对文档进行轻微扫描后它们看起来还不错。查看LexerGenerator和ParserGenerator。LexerGenerator 将采用一堆正则表达式来描述语言(在本例中为 CSS)中的不同类型的标记,并输出一些识别单个标记的代码。ParserGenerator 将采用语法,描述语言中的哪些事物由哪些其他事物组成,并吐出一个解析器,该代码采用一堆标记并返回语法树(您所追求的数据结构。
回答by Jacek Lange
Do not use your own regex for parsing CSS. Why reinvent the wheel while there is code waiting for you, ready to use and (hopefully) bug-free?
不要使用您自己的正则表达式来解析 CSS。为什么要在代码等着你的时候重新发明轮子,准备好使用并且(希望)没有错误?
There are two generally available classes that can parse CSS for you:
有两个通用类可以为您解析 CSS:
HTML_CSS PEAR package at pear.php.net
pear.php.net 上的 HTML_CSS PEAR 包
and
和
CSS Parser class at PHPCLasses:
PHPCLasses 中的 CSS 解析器类:
回答by dbr
I would recommend against using regex's to parse CSS - especially in single regex!
我建议不要使用正则表达式来解析 CSS - 特别是在单个正则表达式中!
If you insist on doing the parsing in regex's, split it up into sensible sections - use one regex to split all the body{..}blocks, then another to parse the color:rgb(1,2,3);attributes.
如果您坚持在正则表达式中进行解析,请将其拆分为合理的部分 - 使用一个正则表达式来拆分所有body{..}块,然后使用另一个来解析color:rgb(1,2,3);属性。
If you are actually trying to write something "useful" (not trying to learn regular expressions), look for a prewritten CSS parser.
如果您实际上是在尝试编写“有用”的东西(而不是尝试学习正则表达式),请寻找预先编写的 CSS 解析器。
I found this cssparser.phpwhich seems to work very well:
我发现这个 cssparser.php似乎工作得很好:
$cssp = new cssparser;
$cssp -> ParseStr("body { background: #f00;font: 12px Arial; }");
print_r($cssp->css);
..which outputs the following:
..输出以下内容:
Array
(
[body] => Array
(
[background] => #f00
[font] => 12px arial
)
)
The parser is pretty simple, so should be easy to work out what it's doing. Oh, I had to remove the lines that read if($this->html) {$this->Add("VAR", "");}(it seems to be a debugging thing that was left in)
解析器非常简单,所以应该很容易弄清楚它在做什么。哦,我不得不删除读取的行if($this->html) {$this->Add("VAR", "");}(这似乎是一个调试的东西)
I've mirrored the script here, with the above changes in
我已经在此处镜像了脚本,并进行了上述更改
回答by Nick Franceschina
I am using the regex below and it pretty much works... of course this question is old now and I see that you've abandoned your efforts... but in case someone else runs across it:
我正在使用下面的正则表达式并且它几乎有效......当然这个问题现在已经过时了,我看到你已经放弃了你的努力......但以防其他人遇到它:
(?<selector>(?:(?:[^,{]+),?)*?)\{(?:(?<name>[^}:]+):?(?<value>[^};]+);?)*?\}
(hafta remove all of the /* comments */from your CSS first to be safe)
(为了安全起见,hafta 首先从 CSS 中删除所有/* 注释 */)
回答by Dan
I wrote a piece of code that easily parses CSS. All you have to do is do a couple of explodes really... The $css variable is a string of the CSS. All you have to do is do a print_r($css)to get a nice array of CSS, fully parsed.
我写了一段很容易解析 CSS 的代码。您所要做的就是真正地进行几次爆炸... $css 变量是 CSS 的字符串。你所要做的就是做print_r($css)一个很好的 CSS 数组,完全解析。
$css_array = array(); // master array to hold all values
$element = explode('}', $css);
foreach ($element as $element) {
// get the name of the CSS element
$a_name = explode('{', $element);
$name = $a_name[0];
// get all the key:value pair styles
$a_styles = explode(';', $element);
// remove element name from first property element
$a_styles[0] = str_replace($name . '{', '', $a_styles[0]);
// loop through each style and split apart the key from the value
$count = count($a_styles);
for ($a=0;$a<$count;$a++) {
if ($a_styles[$a] != '') {
$a_key_value = explode(':', $a_styles[$a]);
// build the master css array
$css_array[$name][$a_key_value[0]] = $a_key_value[1];
}
}
}
Gives you this:
给你这个:
Array
(
[body] => Array
(
[background] => #f00
[font] => 12px arial
)
)
回答by CTS_AE
Building off of the current answer by Tanktalus there's a couple of improvements and edge cases to note.
在 Tanktalus 当前答案的基础上,有一些改进和边缘情况需要注意。
CSS Parsing Regex
CSS 解析正则表达式
\s*([^{]+)\s*\{\s*([^}]*?)\s*}
This Regex will do some space trimming and hits on some additional edge cases as listed in this example: https://regex101.com/r/qQRIHx/5
此正则表达式将在此示例中列出的一些其他边缘情况下进行一些空间修剪和命中:https: //regex101.com/r/qQRIHx/5
key:value pairs; Pitfalls of Further Complexicated Regex
键:值对;更复杂的正则表达式的陷阱
I too started to try work on delimiting the key:value pairs but quickly saw in the case where there were multiple styles per selector that things started to get trickier than I wanted. You can view version 1 of the regex where I tried to delimit the key:values and how it failed with multiple declarations here: https://regex101.com/r/qQRIHx/1
我也开始尝试分隔键:值对,但很快发现在每个选择器有多种样式的情况下,事情开始变得比我想要的更棘手。您可以在此处查看正则表达式的第 1 版,我试图在其中分隔键:值以及它如何在多个声明中失败:https: //regex101.com/r/qQRIHx/1
Implementation
执行
As others mentioned, you should break this up into multiple steps to parse and tokenize your css. This regex will help you obtain the declarations, but you will need to then parse those out.
正如其他人提到的,您应该将其分解为多个步骤来解析和标记您的 css。此正则表达式将帮助您获取声明,但您随后需要解析这些声明。
Declaration Parser
声明解析器
You could use something like this to parse the declarations after you get your first set of matches.
在获得第一组匹配项后,您可以使用类似的方法来解析声明。
([^:\s]+)*\s*:\s*([^;]+);
([^:\s]+)*\s*:\s*([^;]+);
Example: https://regex101.com/r/py9OKO/1/
示例:https: //regex101.com/r/py9OKO/1/
边缘情况The above example works great with multiple declarations, but it's possible that it's just 1 declaration with no semi-colon to end which will render in [most] browsers but break this regex.
上面的示例适用于多个声明,但可能只有 1 个声明没有分号结束,这将在 [大多数] 浏览器中呈现,但会破坏此正则表达式。
Noted Cases
著名案例
You may also need to account for nested rules in the case that there's a media query. In this case I would try to run the css matching regex against the declarations that are extracted. If you get matches you could run recursion on it (although I'm not sure there's cases where you would have more than 1 level nested for vanilla CSS).
如果存在媒体查询,您可能还需要考虑嵌套规则。在这种情况下,我会尝试针对提取的声明运行 css 匹配正则表达式。如果您获得匹配项,您可以对其运行递归(尽管我不确定在某些情况下您会为 vanilla CSS 嵌套超过 1 个级别)。
边缘情况- This doesn't handle a right curly bracket in a string
- 这不处理字符串中的右大括号
Tomorrow's Research
明天的研究
I've decided to instead use an npm package like cssor cssom. I know this is in PHP but it's going to do a lot of heavy lifting for me and handle edge cases I keep running into.
我决定改用 npm 包,如css或cssom。我知道这是在 PHP 中,但它会为我做很多繁重的工作,并处理我不断遇到的边缘情况。
Edit:
编辑:
I ended up using Jotform's public css.js library. It has a really small footprint which was one of the main requirements I had when choosing libraries to parse CSS.
我最终使用了 Jotform 的公共 css.js 库。它的占用空间非常小,这是我在选择库来解析 CSS 时的主要要求之一。
- https://github.com/jotform/css.js/tree/master
- They also published this article explaining their process:
回答by Poseidon
Try this
尝试这个
function trimStringArray($stringArray){
$result = array();
for($i=0; $i < count($stringArray); $i++){
$trimmed = trim($stringArray[$i]);
if($trimmed != '') $result[] = $trimmed;
}
return $result;
}
$regExp = '/\{|\}/';
$rawCssData = preg_split($regExp, $style);
$cssArray = array();
for($i=0; $i < count($rawCssData); $i++){
if($i % 2 == 0){
$cssStyle['selectors'] = array();
$selectors = split(',', $rawCssData[$i]);
$cssStyle['selectors'] = trimStringArray($selectors);
}
if($i % 2 == 1){
$attributes = split(';', $rawCssData[$i]);
$cssStyle['attributes'] = trimStringArray($attributes);
$cssArray[] = $cssStyle;
}
}
//return false;
echo '<pre>'."\n";
print_r($cssArray);
echo '</pre>'."\n";

