php file_get_contents() 分解 UTF-8 字符

Question

提问by Richard Knop

I am loading a HTML from an external server. The HTML markup has UTF-8 encoding and contains characters such as ?,?,?,?,? etc. When I load the HTML with file_get_contents() like this:

我正在从外部服务器加载 HTML。HTML 标记采用 UTF-8 编码并包含诸如 ?,?,?,?,? 之类的字符。等当我使用 file_get_contents() 加载 HTML 时，如下所示：

$html = file_get_contents('http://example.com/foreign.html');

It messes up the UTF-8 characters and loads ?, ?, ¤ and similar nonsense instead of proper UTF-8 characters.

它弄乱了 UTF-8 字符并加载 ?, ?, ¤ 和类似的废话而不是正确的 UTF-8 字符。

How can I solve this?

我该如何解决这个问题？

UPDATE:

更新：

I tried both saving the HTML to a file and outputting it with UTF-8 encoding. Both doesn't work so it means file_get_contents() is already returning broken HTML.

我尝试将 HTML 保存到文件并使用 UTF-8 编码输出。两者都不起作用，所以这意味着 file_get_contents() 已经返回损坏的 HTML。

UPDATE2:

更新2：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Language" content="sk" />
<title>Test</title>

</head>
<body>


<?php

$html = file_get_contents('http://example.com');
echo htmlentities($html);

?>

</body>
</html>

Answer 1

采纳答案by Richard Knop

Alright. I have found out the file_get_contents() is not causing this problem. There's a different reason which I talk about in another question. Silly me.

好吧。我发现 file_get_contents() 没有导致这个问题。我在另一个问题中谈到了一个不同的原因。傻我。

See this question: Why Does DOM Change Encoding?

看到这个问题：为什么 DOM 会改变编码？

Answer 2

回答by ugniesdebesys

I had similar problem with polish language

我对波兰语有类似的问题

I tried:

我试过：

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));

I tried:

我试过：

$fileEndEnd = utf8_encode ( $fileEndEnd );

I tried:

我试过：

$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd );

And then -

进而 -

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");

This last worked perfectly !!!!!!

这最后一次完美地工作！！！！！！！

Answer 3

回答by Gordon

Solution suggested in the comments of the PHP manual entry for file_get_contents

在 PHP 手册条目 file_get_contents 的注释中建议的解决方案

function file_get_contents_utf8($fn) {
     $content = file_get_contents($fn);
      return mb_convert_encoding($content, 'UTF-8',
          mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}

You might also try your luck with http://php.net/manual/en/function.mb-internal-encoding.php

你也可以试试你的运气http://php.net/manual/en/function.mb-internal-encoding.php

Answer 4

回答by Dr. Dama

I think you simply have a double conversion of the character type there :D

我认为您只是在那里对字符类型进行了双重转换：D

It may be, because you opened an html document within a html document. So you have something that looks like this in the end

可能是因为您在 html 文档中打开了 html 文档。所以你最终会有这样的东西

<!DOCTYPE html> 
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<!DOCTYPE html> 
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Test</title>.......

The use of mb_detect_encodingtherefore may lead you to other issues.

mb_detect_encoding因此，使用可能会导致您遇到其他问题。

Answer 5

回答by Mohammad H.

Try this too

也试试这个

 $url = 'http://www.domain.com/';
    $html = file_get_contents($url);

    //Change encoding to UTF-8 from ISO-8859-1
    $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);

Answer 6

回答by Mustafa Ergüven

?n Turkish language, mb_convert_encoding or any other charset conversion did not work.

?n 土耳其语、mb_convert_encoding 或任何其他字符集转换不起作用。

And also urlencode did not work because of space char converted to + char. It must be %20 for percent encoding.

而且 urlencode 也不起作用，因为空格字符转换为 + 字符。对于百分比编码，它必须是 %20。

This one worked!

这个成功了！

   $url = rawurlencode($url);
   $url = str_replace("%3A", ":", $url);
   $url = str_replace("%2F", "/", $url);

   $data = file_get_contents($url);

Answer 7

回答by Dorian PIERREFEU

Exemple :

例子：

$string = file_get_contents(".../File.txt");
$string = mb_convert_encoding($string, 'UTF-8', "ISO-8859-1");
echo $string;

Answer 8

回答by Albert

I had a similar problem, what solved it was html_entity_decode.

我有一个类似的问题，解决它的方法是html_entity_decode。

My code is:

我的代码是：

$content = file_get_contents("http://example.com/fr");
$x = new SimpleXMLElement($content);
foreach($x->channel->item as $entry) {
    $subEntry = html_entity_decode($entry->description);
}

In here I am retrieving an xml file (in French), that's why I'm using this $x object variable. And only then I decode it into this variable $subEntry.

在这里，我正在检索一个 xml 文件（法语），这就是我使用这个 $x 对象变量的原因。然后我才把它解码成这个变量$subEntry。

I tried mb_convert_encodingbut this didn't work for me.

我试过了，mb_convert_encoding但这对我不起作用。

Answer 9

回答by Juergen

Try this function

试试这个功能

function mb_html_entity_decode($string) {
if (extension_loaded('mbstring') === true)
{
    mb_language('Neutral');
    mb_internal_encoding('UTF-8');
    mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'));

    return mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
}

return html_entity_decode($string, ENT_COMPAT, 'UTF-8');

}

Answer 10

回答by matasoy

I am working with 35000 lines of data.

我正在处理 35000 行数据。

$f=fopen("veri1.txt","r");
$i=0;
while(!feof($f)){
    $i++;
    $line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8");
    echo $line;
}

This code convert my strange characters into normal.

此代码将我的奇怪字符转换为正常字符。

php file_get_contents() 分解 UTF-8 字符

提问by Richard Knop

采纳答案by Richard Knop

回答by ugniesdebesys

回答by Gordon

回答by Dr. Dama

回答by Mohammad H.

回答by Mustafa Ergüven

回答by Dorian PIERREFEU

回答by Albert

回答by Juergen

回答by matasoy

相关推荐

最近更新

标签

php file_get_contents() 分解 UTF-8 字符

提问by Richard Knop

采纳答案by Richard Knop

回答by ugniesdebesys

回答by Gordon

回答by Dr. Dama

回答by Mohammad H.

回答by Mustafa Ergüven

回答by Dorian PIERREFEU

回答by Albert

回答by Juergen

回答by matasoy

相关推荐

检查 PHP 会话 - ISSET（$会话 - 不工作

获取 PHP 中所有 UTF-8 空白字符的完整列表的最简单方法

php Magento getParam v $_GET

php Mysqli 抛出“警告：mysqli_stmt_bind_param() 期望参数 1 为 mysqli_stmt，给出布尔值”

相关推荐

最近更新

标签