PHP Curl UTF-8 字符集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10761411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 22:59:09  来源:igfitidea点击:

PHP Curl UTF-8 Charset

phputf-8character-encoding

提问by Bora Alp Arat

I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem. My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server). What is the missing thing in that code and all spanish letters look weird. PS. When I wrote these weird characters original versions into php, they all look accurate.

我有一个 php 脚本,它调用另一个网页并写入页面的所有 html,一切正常,但是存在字符集问题。我的 php 文件编码是 utf-8 并且所有其他 php 文件都可以正常工作(这意味着服务器没有问题)。该代码中缺少什么,所有西班牙语字母看起来都很奇怪。附注。当我将这些奇怪的字符原始版本写入php时,它们看起来都很准确。

header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
    $ch=curl_init();
    curl_setopt($ch,CURLOPT_HEADER,0);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
    $data=curl_exec($ch);
    curl_close($ch);
    return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);

回答by julio

Simple: When you use curl it encodes the string to utf-8you just need to decode them..

简单:当您使用 curl 时,它会将字符串编码为utf-8您只需要解码它们..

Description

string utf8_decode ( string $data )

This function decodes data , assumed to be UTF-8encoded, to ISO-8859-1.

此函数将假定已UTF-8编码的数据解码为ISO-8859-1

回答by amir rasabeh

You Can use header

你可以使用标题

   header('Content-type: text/html; charset=UTF-8');

and after decode string

和解码字符串后

 $page = utf8_decode(curl_exec($ch));

It's worked for me

它对我有用

回答by Engin Zeybeko?lu

function page_title($val){
    include(dirname(__FILE__).'/simple_html_dom.php');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$val);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
    curl_setopt($ch, CURLOPT_ENCODING , "gzip");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    $return = curl_exec($ch); 
    $encot = false;
    $charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    curl_close($ch); 
    $html = str_get_html('"'.$return.'"');

    if(strpos($charset,'charset=') !== false) {
        $c = str_replace("text/html; charset=","",$charset);
        $encot = true;
    }
    else {
        $lookat=$html->find('meta[http-equiv=Content-Type]',0);
        $chrst = $lookat->content;
        preg_match('/charset=(.+)/', $chrst, $found);
        $p = trim($found[1]);
        if(!empty($p) && $p != "")
        {
            $c = $p;
            $encot = true;
        }
    }
    $title = $html->find('title')[0]->innertext;
    if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);

    return $title;
}

回答by Taron

$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);

回答by michalzuber

I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch));returned UTF-8. Tried utf8_encode(curl_exec($ch));and the characters were correct.

我正在通过 cURL 和mb_detect_encoding(curl_exec($ch));返回的 UTF-8获取一个 windows-1252 编码的文件。试过utf8_encode(curl_exec($ch));了,字符是正确的。

回答by MAChitgarha

First method (internal function)

第一种方法(内部函数)

The best way I have tried before is to use urlencode(). Keep in mind, don't use it for the whole url; instead, use it only for the needed parts. For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.

我之前尝试过的最好方法是使用urlencode(). 请记住,不要在整个 url 中使用它;相反,仅将其用于所需的部分。例如,一个请求有两个 'text-fa' 和 'text-en' 字段,它们分别包含一个波斯语和一个英语文本,您可能只需要对波斯语文本进行编码,而不是对英语文本进行编码。

Second Method (using cURL function)

第二种方法(使用 cURL 函数)

However, there are better ways if the range of characters have to be encoded is more limited. One of these ways is using CURLOPT_ENCODING, by passing it to curl_setopt():

但是,如果必须编码的字符范围更有限,则有更好的方法。其中一种方法是使用CURLOPT_ENCODING,将其传递给curl_setopt()

curl_setopt($ch, CURLOPT_ENCODING, "");