撇号呈现为 atm。什么 PHP 函数会将它显示为 ' ?东西_解码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4583369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
An apostrophe is rendering as atm. What PHP function will display it as ' ? something_Decode?
提问by bflora2
I'm grabbing some tweets and printing them out on my site and curly apostrophes are being rendered as "atm". This is not good. What php function should I run the string through to get these weird characters to display as something closer to '?
我正在抓取一些推文并将它们打印在我的网站上,并且卷曲的撇号被呈现为“atm”。这是不好的。我应该运行什么 php 函数来让这些奇怪的字符显示为更接近 ' 的东西?
采纳答案by Floern
You could try to use the following function:
您可以尝试使用以下功能:
function htmlallentities($str){
$res = '';
$strlen = strlen($str);
for($i=0; $i<$strlen; $i++){
$byte = ord($str[$i]);
if($byte < 128) // 1-byte char
$res .= $str[$i];
elseif($byte < 192); // invalid utf8
elseif($byte < 224) // 2-byte char
$res .= '&#'.((63&$byte)*64 + (63&ord($str[++$i]))).';';
elseif($byte < 240) // 3-byte char
$res .= '&#'.((15&$byte)*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).';';
elseif($byte < 248) // 4-byte char
$res .= '&#'.((15&$byte)*262144 + (63&ord($str[++$i]))*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).';';
}
return $res;
}
call:
称呼:
$str = htmlallentities($str);
this will change utf-8-chars into htmlentities, so you can display them in different encodings.
这会将 utf-8-chars 更改为 htmlentities,因此您可以以不同的编码显示它们。
回答by Avindra Goolcharan
I was having trouble in Chrome with this.
我在 Chrome 中遇到了这个问题。
Adding a
添加一个
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
to the "head" section fixes it
到“头部”部分修复它
回答by JaseC
I battled with this for almost a day and then found that this function will work 100% of the time. It works with utf-8 and unicode and converts characters that are beyond the base ascii set into their html entities. It's good for cleaning up MS Word rubbish.
我与这个斗争了将近一天,然后发现这个功能将在 100% 的时间内工作。它适用于 utf-8 和 unicode,并将超出基本 ascii 集的字符转换为它们的 html 实体。它非常适合清理 MS Word 垃圾。
function filterText($text)
{
//UTF-8 filter
$conv = array(
"\xC2\xA0" => ' ',
"\xC2\xA1" => '¡',
"\xC2\xA2" => '¢',
"\xC2\xA3" => '£',
"\xC2\xA4" => '¤',
"\xC2\xA5" => '¥',
"\xC2\xA6" => '¦',
"\xC2\xA7" => '§',
"\xC2\xA8" => '¨',
"\xC2\xA9" => '©',
"\xC2\xAA" => 'ª',
"\xC2\xAB" => '«',
"\xC2\xAC" => '¬',
"\xC2\xAD" => '­',
"\xC2\xAE" => '®',
"\xC2\xAF" => '¯',
"\xC2\xB0" => '°',
"\xC2\xB1" => '±',
"\xC2\xB2" => '²',
"\xC2\xB3" => '³',
"\xC2\xB4" => '´',
"\xC2\xB5" => 'µ',
"\xC2\xB6" => '¶',
"\xC2\xB7" => '·',
"\xC2\xB8" => '¸',
"\xC2\xB9" => '¹',
"\xC2\xBA" => 'º',
"\xC2\xBB" => '»',
"\xC2\xBC" => '¼',
"\xC2\xBD" => '½',
"\xC2\xBE" => '¾',
"\xC2\xBF" => '¿',
"\xC3\x80" => 'À',
"\xC3\x81" => 'Á',
"\xC3\x82" => 'Â',
"\xC3\x83" => 'Ã',
"\xC3\x84" => 'Ä',
"\xC3\x85" => 'Å',
"\xC3\x86" => 'Æ',
"\xC3\x87" => 'Ç',
"\xC3\x88" => 'È',
"\xC3\x89" => 'É',
"\xC3\x8A" => 'Ê',
"\xC3\x8B" => 'Ë',
"\xC3\x8C" => 'Ì',
"\xC3\x8D" => 'Í',
"\xC3\x8E" => 'Î',
"\xC3\x8F" => 'Ï',
"\xC3\x90" => 'Ð',
"\xC3\x91" => 'Ñ',
"\xC3\x92" => 'Ò',
"\xC3\x93" => 'Ó',
"\xC3\x94" => 'Ô',
"\xC3\x95" => 'Õ',
"\xC3\x96" => 'Ö',
"\xC3\x97" => '×',
"\xC3\x98" => 'Ø',
"\xC3\x99" => 'Ù',
"\xC3\x9A" => 'Ú',
"\xC3\x9B" => 'Û',
"\xC3\x9C" => 'Ü',
"\xC3\x9D" => 'Ý',
"\xC3\x9E" => 'Þ',
"\xC3\x9F" => 'ß',
"\xC3\xA0" => 'à',
"\xC3\xA1" => 'á',
"\xC3\xA2" => 'â',
"\xC3\xA3" => 'ã',
"\xC3\xA4" => 'ä',
"\xC3\xA5" => 'å',
"\xC3\xA6" => 'æ',
"\xC3\xA7" => 'ç',
"\xC3\xA8" => 'è',
"\xC3\xA9" => 'é',
"\xC3\xAA" => 'ê',
"\xC3\xAB" => 'ë',
"\xC3\xAC" => 'ì',
"\xC3\xAD" => 'í',
"\xC3\xAE" => 'î',
"\xC3\xAF" => 'ï',
"\xC3\xB0" => 'ð',
"\xC3\xB1" => 'ñ',
"\xC3\xB2" => 'ò',
"\xC3\xB3" => 'ó',
"\xC3\xB4" => 'ô',
"\xC3\xB5" => 'õ',
"\xC3\xB6" => 'ö',
"\xC3\xB7" => '÷',
"\xC3\xB8" => 'ø',
"\xC3\xB9" => 'ù',
"\xC3\xBA" => 'ú',
"\xC3\xBB" => 'û',
"\xC3\xBC" => 'ü',
"\xC3\xBD" => 'ý',
"\xC3\xBE" => 'þ',
"\xC3\xBF" => 'ÿ',
// Latin Extended-A
"\xC5\x92" => 'Œ',
"\xC5\x93" => 'œ',
"\xC5\xA0" => 'Š',
"\xC5\xA1" => 'š',
"\xC5\xB8" => 'Ÿ',
// Spacing Modifier Letters
"\xCB\x86" => 'ˆ',
"\xCB\x9C" => '˜',
// General Punctuation
"\xE2\x80\x82" => ' ',
"\xE2\x80\x83" => ' ',
"\xE2\x80\x89" => ' ',
"\xE2\x80\x8C" => '‌',
"\xE2\x80\x8D" => '‍',
"\xE2\x80\x8E" => '‎',
"\xE2\x80\x8F" => '‏',
"\xE2\x80\x93" => '–',
"\xE2\x80\x94" => '—',
"\xE2\x80\x98" => '‘',
"\xE2\x80\x99" => '’',
"\xE2\x80\x9A" => '‚',
"\xE2\x80\x9C" => '“',
"\xE2\x80\x9D" => '”',
"\xE2\x80\x9E" => '„',
"\xE2\x80\xA0" => '†',
"\xE2\x80\xA1" => '‡',
"\xE2\x80\xB0" => '‰',
"\xE2\x80\xB9" => '‹',
"\xE2\x80\xBA" => '›',
"\xE2\x82\xAC" => '€',
// Latin Extended-B
"\xC6\x92" => 'ƒ',
// Greek
"\xCE\x91" => 'Α',
"\xCE\x92" => 'Β',
"\xCE\x93" => 'Γ',
"\xCE\x94" => 'Δ',
"\xCE\x95" => 'Ε',
"\xCE\x96" => 'Ζ',
"\xCE\x97" => 'Η',
"\xCE\x98" => 'Θ',
"\xCE\x99" => 'Ι',
"\xCE\x9A" => 'Κ',
"\xCE\x9B" => 'Λ',
"\xCE\x9C" => 'Μ',
"\xCE\x9D" => 'Ν',
"\xCE\x9E" => 'Ξ',
"\xCE\x9F" => 'Ο',
"\xCE\xA0" => 'Π',
"\xCE\xA1" => 'Ρ',
"\xCE\xA3" => 'Σ',
"\xCE\xA4" => 'Τ',
"\xCE\xA5" => 'Υ',
"\xCE\xA6" => 'Φ',
"\xCE\xA7" => 'Χ',
"\xCE\xA8" => 'Ψ',
"\xCE\xA9" => 'Ω',
"\xCE\xB1" => 'α',
"\xCE\xB2" => 'β',
"\xCE\xB3" => 'γ',
"\xCE\xB4" => 'δ',
"\xCE\xB5" => 'ε',
"\xCE\xB6" => 'ζ',
"\xCE\xB7" => 'η',
"\xCE\xB8" => 'θ',
"\xCE\xB9" => 'ι',
"\xCE\xBA" => 'κ',
"\xCE\xBB" => 'λ',
"\xCE\xBC" => 'μ',
"\xCE\xBD" => 'ν',
"\xCE\xBE" => 'ξ',
"\xCE\xBF" => 'ο',
"\xCF\x80" => 'π',
"\xCF\x81" => 'ρ',
"\xCF\x82" => 'ς',
"\xCF\x83" => 'σ',
"\xCF\x84" => 'τ',
"\xCF\x85" => 'υ',
"\xCF\x86" => 'φ',
"\xCF\x87" => 'χ',
"\xCF\x88" => 'ψ',
"\xCF\x89" => 'ω',
"\xCF\x91" => 'ϑ',
"\xCF\x92" => 'ϒ',
"\xCF\x96" => 'ϖ',
// General Punctuation
"\xE2\x80\xA2" => '•',
"\xE2\x80\xA6" => '…',
"\xE2\x80\xB2" => '′',
"\xE2\x80\xB3" => '″',
"\xE2\x80\xBE" => '‾',
"\xE2\x81\x84" => '⁄',
// Letterlike Symbols
"\xE2\x84\x98" => '℘',
"\xE2\x84\x91" => 'ℑ',
"\xE2\x84\x9C" => 'ℜ',
"\xE2\x84\xA2" => '™',
"\xE2\x84\xB5" => 'ℵ',
// Arrows
"\xE2\x86\x90" => '←',
"\xE2\x86\x91" => '↑',
"\xE2\x86\x92" => '→',
"\xE2\x86\x93" => '↓',
"\xE2\x86\x94" => '↔',
"\xE2\x86\xB5" => '↵',
"\xE2\x87\x90" => '⇐',
"\xE2\x87\x91" => '⇑',
"\xE2\x87\x92" => '⇒',
"\xE2\x87\x93" => '⇓',
"\xE2\x87\x94" => '⇔',
// Mathematical Operators
"\xE2\x88\x80" => '∀',
"\xE2\x88\x82" => '∂',
"\xE2\x88\x83" => '∃',
"\xE2\x88\x85" => '∅',
"\xE2\x88\x87" => '∇',
"\xE2\x88\x88" => '∈',
"\xE2\x88\x89" => '∉',
"\xE2\x88\x8B" => '∋',
"\xE2\x88\x8F" => '∏',
"\xE2\x88\x91" => '∑',
"\xE2\x88\x92" => '−',
"\xE2\x88\x97" => '∗',
"\xE2\x88\x9A" => '√',
"\xE2\x88\x9D" => '∝',
"\xE2\x88\x9E" => '∞',
"\xE2\x88\xA0" => '∠',
"\xE2\x88\xA7" => '∧',
"\xE2\x88\xA8" => '∨',
"\xE2\x88\xA9" => '∩',
"\xE2\x88\xAA" => '∪',
"\xE2\x88\xAB" => '∫',
"\xE2\x88\xB4" => '∴',
"\xE2\x88\xBC" => '∼',
"\xE2\x89\x85" => '≅',
"\xE2\x89\x88" => '≈',
"\xE2\x89\xA0" => '≠',
"\xE2\x89\xA1" => '≡',
"\xE2\x89\xA4" => '≤',
"\xE2\x89\xA5" => '≥',
"\xE2\x8A\x82" => '⊂',
"\xE2\x8A\x83" => '⊃',
"\xE2\x8A\x84" => '⊄',
"\xE2\x8A\x86" => '⊆',
"\xE2\x8A\x87" => '⊇',
"\xE2\x8A\x95" => '⊕',
"\xE2\x8A\x97" => '⊗',
"\xE2\x8A\xA5" => '⊥',
"\xE2\x8B\x85" => '⋅',
// Miscellaneous Technical
"\xE2\x8C\x88" => '⌈',
"\xE2\x8C\x89" => '⌉',
"\xE2\x8C\x8A" => '⌊',
"\xE2\x8C\x8B" => '⌋',
"\xE2\x8C\xA9" => '⟨',
"\xE2\x8C\xAA" => '⟩',
// Geometric Shapes
"\xE2\x97\x8A" => '◊',
// Miscellaneous Symbols
"\xE2\x99\xA0" => '♠',
"\xE2\x99\xA3" => '♣',
"\xE2\x99\xA5" => '♥',
"\xE2\x99\xA6" => '♦'
);
$string = strtr($text, $conv);
//now translate any unicode stuff...
$conv = array(
chr(128) => "€",
chr(130) => "‚",
chr(131) => "ƒ",
chr(132) => "„",
chr(133) => "…",
chr(134) => "†",
chr(135) => "‡",
chr(136) => "ˆ",
chr(137) => "‰",
chr(138) => "Š",
chr(139) => "‹",
chr(140) => "Œ",
chr(145) => "‘",
chr(146) => "’",
chr(147) => "“",
chr(148) => "”",
chr(149) => "•",
chr(150) => "–",
chr(151) => "—",
chr(152) => "˜",
chr(153) => "™",
chr(154) => "š",
chr(155) => "›",
chr(156) => "œ",
chr(159) => "ÿ",
chr(160) => " ",
chr(161) => "¡",
chr(162) => "¢",
chr(163) => "£",
chr(164) => "¤",
chr(165) => "¥",
chr(166) => "¦",
chr(167) => "§",
chr(168) => "¨",
chr(169) => "©",
chr(170) => "ª",
chr(171) => "«",
chr(172) => "¬",
chr(173) => "­",
chr(174) => "®",
chr(175) => "¯",
chr(176) => "°",
chr(177) => "±",
chr(178) => "²",
chr(179) => "³",
chr(180) => "´",
chr(181) => "µ",
chr(182) => "¶",
chr(183) => "·",
chr(184) => "¸",
chr(185) => "¹",
chr(186) => "º",
chr(187) => "»",
chr(188) => "¼",
chr(189) => "½",
chr(190) => "¾",
chr(191) => "¿",
chr(192) => "À",
chr(193) => "Á",
chr(194) => "Â",
chr(195) => "Ã",
chr(196) => "Ä",
chr(197) => "Å",
chr(198) => "Æ",
chr(199) => "Ç",
chr(200) => "È",
chr(201) => "É",
chr(202) => "Ê",
chr(203) => "Ë",
chr(204) => "Ì",
chr(205) => "Í",
chr(206) => "Î",
chr(207) => "Ï",
chr(208) => "Ð",
chr(209) => "Ñ",
chr(210) => "Ò",
chr(211) => "Ó",
chr(212) => "Ô",
chr(213) => "Õ",
chr(214) => "Ö",
chr(215) => "×",
chr(216) => "Ø",
chr(217) => "Ù",
chr(218) => "Ú",
chr(219) => "Û",
chr(220) => "Ü",
chr(221) => "Ý",
chr(222) => "Þ",
chr(223) => "ß",
chr(224) => "à",
chr(225) => "á",
chr(226) => "â",
chr(227) => "ã",
chr(228) => "ä",
chr(229) => "å",
chr(230) => "æ",
chr(231) => "ç",
chr(232) => "è",
chr(233) => "é",
chr(234) => "ê",
chr(235) => "ë",
chr(236) => "ì",
chr(237) => "í",
chr(238) => "î",
chr(239) => "ï",
chr(240) => "ð",
chr(241) => "ñ",
chr(242) => "ò",
chr(243) => "ó",
chr(244) => "ô",
chr(245) => "õ",
chr(246) => "ö",
chr(247) => "÷",
chr(248) => "ø",
chr(249) => "ù",
chr(250) => "ú",
chr(251) => "û",
chr(252) => "ü",
chr(253) => "ý",
chr(254) => "þ",
chr(255) => "ÿ");
return strtr($string, $conv);
}
回答by Quamis
str_replace('atm', "'", $dirty_string)
might give you a quick and dirty fix. But it seems to me like a character encoding problem. You may read the tweets using an encoding and displaying them in another encoding.
str_replace('atm', "'", $dirty_string)
可能会给你一个快速而肮脏的修复。但在我看来像是字符编码问题。您可以使用一种编码阅读推文并以另一种编码显示它们。
You'd have to check your code and make sure you use the same encoding all over the place if you wanna do this the "clean way".
如果您想以“干净的方式”执行此操作,则必须检查您的代码并确保在所有地方都使用相同的编码。
回答by Akanksha Sharma
To convert to HTML entities:
转换为 HTML 实体:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
有关更多编码选项,请参阅 mb_convert_encoding 文档。