php 正则表达式去除注释和多行注释和空行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/643113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:21:44  来源:igfitidea点击:

Regex to strip comments and multi-line comments and empty lines

phpregexpreg-replace

提问by Ahmad Fouad

I want to parse a file and I want to use php and regex to strip:

我想解析一个文件,我想使用 php 和 regex 来删除:

  • blank or empty lines
  • single line comments
  • multi line comments
  • 空白或空行
  • 单行注释
  • 多行注释

basically I want to remove any line containing

基本上我想删除任何包含

/* text */ 

or multi line comments

或多行注释

/***
some
text
*****/

If possible, another regex to check if the line is empty (Remove blank lines)

如果可能,另一个正则表达式来检查该行是否为空(删除空行)

Is that possible? can somebody post to me a regex that does just that?

那可能吗?有人可以向我发布一个可以做到这一点的正则表达式吗?

Thanks a lot.

非常感谢。

回答by chaos

$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);

回答by Chris Lutz

Keep in mind that any regex you use will fail if the file you're parsing has a string containing something that matches these conditions. For example, it would turn this:

请记住,如果您正在解析的文件的字符串包含与这些条件匹配的内容,则您使用的任何正则表达式都将失败。例如,它会变成这样:

print "/* a comment */";

Into this:

进入这个:

print "";

Which is probably notwhat you want. But maybe it is, I don't know. Anyway, regexes technically can't parse data in a manner to avoid that problem. I say technically because modern PCRE regexes have tacked on a number of hacks to make them both capable of doing this and, more importantly, no longer regularexpressions, but whatever. If you want to avoid stripping these things inside quotes or in other situations, there is no substitute for a full-blown parser (albeit it can still be pretty simple).

这可能不是你想要的。但也许是,我不知道。无论如何,正则表达式在技术上无法以某种方式解析数据以避免该问题。我这么说是因为现代 PCRE 正则表达式已经添加了许多技巧,使它们都能够做到这一点,更重要的是,不再是正则表达式,而是其他任何东西。如果您想避免在引号内或其他情况下剥离这些内容,则无法替代成熟的解析器(尽管它仍然可以非常简单)。

回答by makaveli_lcf

//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);

回答by soulmerge

It ispossible, but I wouldn't do it. You need to parse the whole php file to make sure that you're not removing any necessary whitespace (strings, whitespace beween keywords/identifiers (publicfuntiondoStuff()), etc). Better use the tokenizer extensionof PHP.

可能的,但我不会这样做。您需要解析整个 php 文件以确保您没有删除任何必要的空格(字符串、关键字/标识符之间的空格(publicfuntiondoStuff())等)。最好使用PHP的标记器扩展

回答by St. John Johnson

This should work in replacing all /* to */.

这应该可以将所有 /* 替换为 */。

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);

回答by Federico Biccheddu

$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);

回答by giuseppe

This is my solution , if one is not used to regexp. The following code remove all comment delimited by # and retrieves the values of variable in this style NAME=VALUE

这是我的解决方案,如果不习惯正则表达式。以下代码删除所有由 # 分隔的注释,并以这种 NAME=VALUE 样式检索变量的值

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

回答by Eduardo Cuomo

This is a good function, and WORKS!

这是一个很好的功能,并且有效!

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

Now using this function 'strip_comments' for passing code contained in some variable:

现在使用这个函数 'strip_comments' 来传递包含在某个变量中的代码:

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Will result output as

将结果输出为

<?
echo "And I am some code...";
?>

Loading from a php file:

从 php 文件加载:

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Loading a php file, stripping comments and saving it back

加载一个 php 文件,剥离注释并将其保存回来

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

Source: http://www.php.net/manual/en/tokenizer.examples.php

来源:http: //www.php.net/manual/en/tokenizer.examples.php

回答by Rogerio Dalot

I found this one to suit me better, (\s+)\/\*([^\/]*)\*/\n*it removes multi-line, tabbed or not comments and the spaced behind it. I'll leave a comment example which this regex would match.

我发现这个更适合我,(\s+)\/\*([^\/]*)\*/\n*它删除了多行、带标签或不带标签的注释以及它后面的空格。我将留下这个正则表达式匹配的评论示例。

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */