PHP 输出显示带有问号的黑色小菱形
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/275411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP output showing little black diamonds with a question mark
提问by
I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (?, REPLACEMENT CHARACTER, I assume from Microsoft Word text).
我正在编写一个从数据库源中提取的 php 程序。一些 varchars 的引号显示为带有问号的黑色菱形(?, REPLACEMENT CHARACTER,我假设来自 Microsoft Word 文本)。
How can I use php to strip these characters out?
如何使用 php 去除这些字符?
回答by
If you see that character (? U+FFFD "REPLACEMENT CHARACTER") it usually means that the text itself is encoded in some form of single byte encoding but interpreted in one of the unicode encodings (UTF8 or UTF16).
如果您看到该字符 (? U+FFFD "REPLACEMENT CHARACTER"),则通常意味着文本本身以某种形式的单字节编码进行编码,但以其中一种 unicode 编码(UTF8 或 UTF16)进行解释。
If it were the other way around it would (usually) look something like this: ?¤.
如果反过来,它会(通常)看起来像这样:?¤。
Probably the original encoding is ISO-8859-1, also known as Latin-1. You can check this without having to change your script: Browsers give you the option to re-interpret a page in a different encoding -- in Firefox use "View" -> "Character Encoding".
可能原始编码是 ISO-8859-1,也称为 Latin-1。您无需更改脚本即可进行检查:浏览器为您提供了以不同编码重新解释页面的选项——在 Firefox 中使用“查看”->“字符编码”。
To make the browser use the correct encoding, add an HTTP header like this:
要使浏览器使用正确的编码,请添加如下所示的 HTTP 标头:
header("Content-Type: text/html; charset=ISO-8859-1");
or put the encoding in a meta tag:
或将编码放入元标记中:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Alternatively you could try to read from the database in another encoding (UTF-8, preferably) or convert the text with iconv().
或者,您可以尝试以另一种编码(最好是 UTF-8)从数据库中读取或使用iconv().
回答by troelskn
This is a charset issue. As such, it can have gone wrong on many different levels, but most likely, the strings in your database are utf-8 encoded, and you are presenting them as iso-8859-1. Or the other way around.
这是一个字符集问题。因此,它可能在许多不同的级别上出错,但最有可能的是,您数据库中的字符串是 utf-8 编码的,并且您将它们显示为 iso-8859-1。或者反过来。
The proper way to fix this problem, is to get your character-sets straight. The simplest strategy, since you're using PHP, is to use iso-8859-1 throughout your application. To do this, you must ensure that:
解决这个问题的正确方法是让你的字符集直截了当。由于您使用的是 PHP,最简单的策略是在整个应用程序中使用 iso-8859-1。为此,您必须确保:
- All PHP source-files are saved as iso-8859-1 (Not to be confused with cp-1252).
- Your web-server is configured to serve files with
charset=iso-8859-1 - Alternatively, you can override the webservers settings from within the PHP-document, using
header. - In addition, you mayinsert a meta-tag in you HTML, that specifies the same thing, but this isn't strictly needed.
- You mayalso specify the
accept-charsetattribute on your<form>elements. - Database tables are defined with encoding as latin1
- The database connection between PHP to and database is set to latin1
- 所有 PHP 源文件都保存为 iso-8859-1(不要与 cp-1252 混淆)。
- 您的网络服务器配置为提供文件
charset=iso-8859-1 - 或者,您可以在 PHP 文档中使用
header. - 此外,您可以在 HTML 中插入一个元标记,用于指定相同的内容,但这并不是绝对必要的。
- 您还可以
accept-charset在<form>元素上指定属性。 - 数据库表定义为 latin1 编码
- PHP to和数据库的数据库连接设置为latin1
If you already have data in your database, you should be aware that they are probably messed up already. If you are not already in production phase, just wipe it all and start over. Otherwise you'll have to do some data cleanup.
如果您的数据库中已经有数据,您应该意识到它们可能已经搞砸了。如果您还没有进入生产阶段,只需将其全部擦除并重新开始。否则,您将不得不进行一些数据清理。
A note on meta-tags, since everybody misunderstands what they are:
关于元标签的说明,因为每个人都误解了它们是什么:
When a web-server serves a file (A HTML-document), it sends some information, that isn't presented directly in the browser. This is known as HTTP-headers. One such header, is the Content-Typeheader, which specifies the mimetype of the file (Eg. text/html) as well as the encoding (aka charset).
While most webservers will send a Content-Typeheader with charsetinfo, it's optional. If it isn't present, the browser will instead interpret any meta-tags with http-equiv="Content-Type". It's important to realise that the meta-tag is onlyinterpreted if the webserver doesn't send the header. In practice this means that it's only used if the page is saved to disk and then opened from there.
当 Web 服务器提供文件(HTML 文档)时,它会发送一些信息,这些信息不会直接显示在浏览器中。这称为 HTTP 标头。一个这样的标头是Content-Type标头,它指定文件的 mimetype(例如text/html)以及编码(又名字符集)。虽然大多数网络服务器会发送一个Content-Type包含charset信息的标头,但它是可选的。如果它不存在,浏览器将转而解释任何带有http-equiv="Content-Type". 重要的是要意识到元标记只有在网络服务器不发送标头时才会被解释。实际上,这意味着只有将页面保存到磁盘然后从那里打开时才会使用它。
This pagehas a very good explanation of these things.
这个页面对这些事情有很好的解释。
回答by Kai Noack
I also faced this ? issue. Meanwhile I ran into three cases where it happened:
我也遇到过这个?问题。与此同时,我遇到了三种情况:
substr()
I was using
substr()on a UTF8 string which cut UTF8 characters, thus the cut chars could not be displayed correctly. Usemb_substr($utfstring, 0, 10, 'utf-8');instead. Creditshtmlspecialchars()
Another problem was using
htmlspecialchars()on a UTF8 string. The fix is to use:htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');preg_replace()
Lastly I found out that
preg_replace()can lead to problems with UTF. The code$string = preg_replace('/[^A-Za-z0-9??üü???]/', ' ', $string);for example transformed the UTF string "F(×)=2×-3" into "F ? 2? ". The fix is to usemb_ereg_replace()instead.
substr()
我
substr()在 UTF8 字符串上使用它剪切 UTF8 字符,因此无法正确显示剪切的字符。使用mb_substr($utfstring, 0, 10, 'utf-8');来代替。学分htmlspecialchars()
另一个问题是
htmlspecialchars()在 UTF8 字符串上使用。解决方法是使用:htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');preg_replace()
最后我发现这
preg_replace()可能会导致 UTF 出现问题。$string = preg_replace('/[^A-Za-z0-9??üü???]/', ' ', $string);例如,代码将 UTF 字符串 "F(×)=2×-3" 转换为 "F ? 2? "。解决方法是mb_ereg_replace()改用。
I hope this additional information will help to get rid of such problems.
我希望这些附加信息将有助于摆脱此类问题。
回答by Hamlet Kraskian
As mentioned in earlier answers, it is happening because your text has been written to the database in iso-8859-1encoding, or any other format.
正如前面的答案中提到的,发生这种情况是因为您的文本已以iso-8859-1编码或任何其他格式写入数据库。
So you just need to convert the data to utf8before outputting it.
所以你只需要utf8在输出之前将数据转换为。
$text = “string from database”;
$text = utf8_encode($text);
echo $text;
回答by ptwiggerl
To make sure your MYSQL connection is set to UTF-8 (or latin1, depending on what you're using), you can do this to:
要确保您的 MYSQL 连接设置为 UTF-8(或 latin1,具体取决于您使用的内容),您可以执行以下操作:
$con = mysql_connect("localhost","username","password");
mysql_set_charset('utf8',$con);
or use this to check what charset you are using:
或使用它来检查您使用的字符集:
$con = mysql_connect("localhost","username","password");
$charset = mysql_client_encoding($con);
echo "The current character set is: $charset\n";
More info here: http://php.net/manual/en/function.mysql-set-charset.php
更多信息:http: //php.net/manual/en/function.mysql-set-charset.php
回答by Daniel Cassidy
Based on your description of the problem, the data in your database is almost certainly encoded as Windows-1252, and your page is almost certainly being served as ISO-8859-1. These two character sets are equivalent except that Windows-1252 has 16 extra characters which are not present in ISO-8859-1, including left and right curly quotes.
根据您对问题的描述,您的数据库中的数据几乎可以肯定编码为Windows-1252,并且您的页面几乎可以肯定为ISO-8859-1。这两个字符集是等效的,只是 Windows-1252 有 16 个额外的字符,这些字符在 ISO-8859-1 中不存在,包括左右大引号。
Assuming my analysis is correct, the simplest solution is to serve your page as Windows-1252. This will work because all characters that are in ISO-8859-1 are also in Windows-1252. In PHP you can change the encoding as follows:
假设我的分析是正确的,最简单的解决方案是将您的页面作为 Windows-1252 提供。这将起作用,因为 ISO-8859-1 中的所有字符也在 Windows-1252 中。在 PHP 中,您可以按如下方式更改编码:
header('Content-Type: text/html; charset=Windows-1252');
However, you really should check what character encoding you are using in your HTML files and the contents of your database, and take care to be consistent, or convert properly where this is not possible.
但是,您确实应该检查您在 HTML 文件和数据库内容中使用的字符编码,并注意保持一致,或者在不可能的情况下正确转换。
回答by DropHit
I chose to strip these characters out of the string by doing this -
我选择通过这样做将这些字符从字符串中剥离 -
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');
回答by rk_programmer
Add this function to your variables utf8_encode($your variable);
将此函数添加到您的变量中 utf8_encode($your variable);
回答by Vishal P Gothi
Try This Please
请试试这个
mb_substr($description, 0, 490, "UTF-8");
mb_substr($description, 0, 490, "UTF-8");
回答by Prasant Kumar
This will help you. Put this inside <head>tag
这会帮助你。把这个放在<head>标签里面
<meta charset="iso-8859-1">

