php: file_get_contents 编码问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/713293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:36:35  来源:igfitidea点击:

php: file_get_contents encoding problem

phpencodingfile-get-contents

提问by Jamol

My task is simple: make a post request to translate.google.com and get the translation. In the following example I'm using the word "hello" to translate into russian.

我的任务很简单:向 translate.google.com 发出帖子请求并获取翻译。在下面的例子中,我使用“你好”这个词来翻译成俄语。

header('Content-Type: text/plain; charset=utf-8');  // optional
error_reporting(E_ALL | E_STRICT);

$context = stream_context_create(array(
    'http' => array(
        'method' => 'POST',
        'header' => implode("\r\n", array(
            'Content-type: application/x-www-form-urlencoded',
            'Accept-Language: en-us,en;q=0.5', // optional
            'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' // optional
        )),
        'content' => http_build_query(array(
            'prev'  =>  '_t',
            'hl'    =>  'en',
            'ie'    =>  'UTF-8',
            'text'  =>  'hello',
            'sl'    =>  'en',
            'tl'    =>  'ru'
        ))
    )
));

$page = file_get_contents('http://translate.google.com/translate_t', false, $context);

require '../simplehtmldom/simple_html_dom.php';
$dom = str_get_html($page);
$translation = $dom->find('#result_box', 0)->plaintext;
echo $translation;

Lines marked as optional are those without which the output is the same. But I'm getting weird characters...

标记为可选的行是那些没有输出相同的行。但我得到了奇怪的字符...

??????

I tried

我试过

echo mb_convert_encoding($translation, 'UTF-8');

But I get

但我得到

Dòé×??

Does anybody know how to solve this problem?

有谁知道如何解决这个问题?

UPDATE:

更新:

  1. Forgot to mention that all my php files are encoded in UTF-8 without BOM
  2. When i change the "to" language to "en", that is translate from english to english, it works ok.
  3. I do not think the library I'm using is messing it up, because I tried to output the whole $page without passing it to the library functions.
  4. I'm using PHP 5
  1. 忘了说我所有的 php 文件都是用 UTF-8 编码的,没有 BOM
  2. 当我将“to”语言更改为“en”时,即从英语翻译成英语,它工作正常。
  3. 我不认为我使用的库把它搞砸了,因为我试图输出整个 $page 而不将它传递给库函数。
  4. 我正在使用 PHP 5

回答by Alekc

Try to see this post if it can help CURL import character encoding problem

试试看这个帖子是否可以帮助CURL导入字符编码问题

Also you can try this snippet (taken from php.net)

你也可以试试这个片段(取自 php.net)

<?php
function file_get_contents_utf8($fn) {
     $content = file_get_contents($fn);
      return mb_convert_encoding($content, 'UTF-8',
          mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
?>

回答by Calvin

First off, is your browser set to UTF-8? In Firefox you can set your text encoding in View->Character Encoding. Make sure you have "Unicode (UTF-8)" selected. I would also set View->Character Encoding->Auto-Detect to "Universal."

首先,您的浏览器是否设置为 UTF-8?在 Firefox 中,您可以在 View->Character Encoding 中设置文本编码。确保您选择了“Unicode (UTF-8)”。我还会将 View->Character Encoding->Auto-Detect 设置为“Universal”。

Secondly, you could try passing the FILE_TEXT flag, like so:

其次,您可以尝试传递 FILE_TEXT 标志,如下所示:

$page = file_get_contents('http://translate.google.com/translate_t', FILE_TEXT, $context);

回答by Milan Babu?kov

Accept-Charsetis not really that optional. You should specify UTF8 there. Russian characters are not valid in ISO_8859-1

Accept-Charset并不是那么可选。您应该在那里指定 UTF8。俄语字符在 ISO_8859-1 中无效