php CURL 导入字符编码问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/649480/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:22:41  来源:igfitidea点击:

CURL import character encoding problem

phpencodingcurl

提问by Rid Iculous

I'm using CURL to import some code. However, in french, all the characters come out funny. For example: Bonjour? ...

我正在使用 CURL 导入一些代码。然而,在法语中,所有的角色都很有趣。例如:卓悦?...

I don't have access to change anything on the imported code. Is there anything I can do my side to fix this?

我无权更改导入代码的任何内容。我能做些什么来解决这个问题吗?

Thanks

谢谢

回答by Alekc

Like Jon Skeet pointed it's difficult to understand your situation, however if you have access only to final text, you can try to use iconvfor changing text encoding.

就像 Jon Skeet 指出的那样,很难理解您的情况,但是如果您只能访问最终文本,则可以尝试使用iconv来更改文本编码。

I.e.

IE

$text = iconv("Windows-1252","UTF-8",$text);

I've had similar issue time ago (with Italian language and special chars) and I've solved it in this way.

我以前遇到过类似的问题(使用意大利语和特殊字符),我已经通过这种方式解决了它。

Try different combination (UTF-8, ISO-8859-1, Windows-1252).

尝试不同的组合(UTF-8、ISO-8859-1、Windows-1252)。

回答by Rid Iculous

I had a similar problem. I tried to loop through all combinations of input and output charsets. Nothing helped! :(

我有一个类似的问题。我试图遍历输入和输出字符集的所有组合。没有任何帮助!:(

However I was able to access the code that actually fetched the data and this is where the culprit lied. Data was fetched via cURL. Adding

但是,我能够访问实际获取数据的代码,这就是罪魁祸首所在。数据是通过 cURL 获取的。添加

 curl_setopt($ch,CURLOPT_BINARYTRANSFER,true);

fixed it.

修复。

A handy set of code to try out all possible combinations of a list of charsets:

一组方便的代码,用于尝试字符集列表的所有可能组合:

$charsets = array(  
        "UTF-8", 
        "ASCII", 
        "Windows-1252", 
        "ISO-8859-15", 
        "ISO-8859-1", 
        "ISO-8859-6", 
        "CP1256"
        ); 

foreach ($charsets as $ch1) { 
    foreach ($charsets as $ch2){ 
        echo "<h1>Combination $ch1 to $ch2 produces: </h1>".iconv($ch1, $ch2, $text_2_convert); 
    } 
} 

回答by Ben

You could replace your

你可以更换你的

$data = curl_exec($ch);

by

经过

$data = utf8_decode(curl_exec($ch));

I had this same issue and it worked well for me.

我有同样的问题,它对我来说效果很好。

回答by Ben

PHP seems to use UTF-8 by default, so I found the following works

PHP似乎默认使用UTF-8,所以我发现了以下作品

$text = iconv("UTF-8","Windows-1252",$text);

$text = iconv("UTF-8","Windows-1252",$text);

回答by rmontagud

I'm currently suffering a similar problem, i'm trying to write a simple html <title>importer cia cURL. So i'm going to give an idea of what i've done until now:

我目前遇到了类似的问题,我正在尝试编写一个简单的 html<title>导入程序 cia cURL。因此,我将介绍我迄今为止所做的工作:

  1. Retrieve the HTML via cURL
  2. Check if there's any hint of encoding on the response headers via curl_getinfo()and match it via regex
  3. Parse the HTML for the purpose of looking at the content-typemeta and the <title>tag (yes, i know the consequences)
  4. Compare both content-type, header and meta and choose the meta one if it's different, because we know noone cares about their httpd configuration and there are a lot of dirt workarounds using it
  5. iconv()the string
  6. Whish everyday that when someone does not follow the standards $DEITYpunishes him/her until the end of the days, because it would save me the meta parsing
  1. 通过 cURL 检索 HTML
  2. 检查响应头上是否有任何编码提示,curl_getinfo()并通过正则表达式进行匹配
  3. 解析 HTML 以查看内容类型元和<title>标签(是的,我知道后果
  4. 比较内容类型、标题和元数据,如果不同就选择元数据,因为我们知道没有人关心他们的 httpd 配置,并且有很多使用它的解决方法
  5. iconv()字符串
  6. 每天都希望当有人不遵守标准时会$DEITY惩罚他/她直到日子结束,因为它可以节省我的元解析