在 PHP 中将这些类型的 unicode 转换为 UTF8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2045058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:47:15  来源:igfitidea点击:

Converting these types of unicode to UTF8 in PHP

phpunicodeutf-8

提问by Simon

I am trying to convert this in to readable UTF8 text in PHP

我正在尝试将其转换为 PHP 中可读的 UTF8 文本

Tel Aviv-Yafo (Hebrew: \u05ea\u05b5\u05bc\u05dc\u05be\u05d0\u05b8\u05d1\u05b4\u05d9\u05d1-\u05d9\u05b8\u05e4\u05d5\u05b9; Arabic: \u062a\u0644 \u0623\u0628\u064a\u0628\u200e, Tall \u02bcAb\u012bb), usually called Tel Aviv

Any ideas on how to do so?

关于如何这样做的任何想法?

Tried several methods online, but couldn't find one.

网上试了好几​​种方法,都没有找到。

In this case I have unicode in Hebrew and Arabic

在这种情况下,我有希伯来语和阿拉伯语的 unicode

回答by dzeikei

None of the other answers work perfectly as is. I've combined them together and my addition results in this one:

其他答案都没有按原样完美地工作。我将它们组合在一起,我的加法结果是:

$replacedString = preg_replace("/\\u([0-9abcdef]{4})/", "&#x;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');

This one definitely does work :)

这个绝对有效:)

回答by Yaron Cohen

I encountered the same problem recently, so was glad to see this question. Doing some tests, I found the following code works:

我最近遇到了同样的问题,很高兴看到这个问题。做了一些测试,我发现以下代码有效:

$replacedString = preg_replace("/\\u([0-9abcdef]{4})/", "&#x;", $original_string);
//$unicodeString    = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES'); 

The only thing I changed is that I commented out the 2nd line of code. Webpage, however, must be set to display UTF-8.

我唯一改变的是我注释掉了第二行代码。但是,网页必须设置为显示 UTF-8。

Enjoy!

享受!

回答by mykhi

it doesn't always work, because /uXXXX code sometimes can contain digits AND letters. try replacing \d (just digits) with \w (\w matches both words and digits).

它并不总是有效,因为 /uXXXX 代码有时可以包含数字和字母。尝试用 \w 替换 \d(仅数字)(\w 匹配单词和数字)。

function unicode_conv($originalString) {
  // The four \\ in the pattern here are necessary to match \u in the original string
  $replacedString = preg_replace("/\\u(\w{4})/", "&#;", $originalString);
  $unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
  return $unicodeString;
}

回答by petr

You should add 'x' after '#' in replacement string to indicate that hexadecimal numbers are used.

您应该在替换字符串中的 '#' 之后添加 'x' 以指示使用十六进制数字。

$replacedString = preg_replace("/\\u(\d{4})/", "&#x;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');

回答by Amber

See this commentfor a way to get a unicode character from its numerical code. Then, you could write a regex replace that will replace each \uXXXXpattern with the equivalent character.

有关从数字代码中获取 unicode 字符的方法,请参阅此注释。然后,您可以编写一个正则表达式替换,\uXXXX用等效字符替换每个模式。

Alternatively, you could replace each \uXXXXpattern with its matching &#XXXX;html entity form, and then use the following:

或者,您可以将每个\uXXXX模式替换为其匹配的&#XXXX;html 实体表单,然后使用以下内容:

mb_convert_encoding(string_with_html_entities, 'UTF-8', 'HTML-ENTITIES');

More complete example:

更完整的例子:

// The four \\ in the pattern here are necessary to match \u in the original string
$replacedString = preg_replace("/\\u(\d{4})/", "&#;", $originalString);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');

回答by Simon

I am trying this code:

我正在尝试这个代码:

function unicode_conv($originalString) {
  // The four \\ in the pattern here are necessary to match \u in the original string
  $replacedString = preg_replace("/\\u(\d{4})/", "&#;", $originalString);
  $unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
  return $unicodeString;
}

echo unicode_conv("Tel Aviv-Yafo (Hebrew: \u05ea\u05b5\u05bc\u05dc\u05be\u05d0\u05b8\u05d1\u05b4\u05d9\u05d1-\u05d9\u05b8\u05e4\u05d5\u05b9; Arabic: \u062a\u0644 \u0623\u0628\u064a\u0628\u200e, Tall \u02bcAb\u012bb), usually called Tel Aviv, is the second largest city in Israel, with an estimated population of 393,900. The city is situated on the Israeli Mediterranean coast, with a land area of 51.8\u00a0square kilometres (20.0\u00a0sq\u00a0mi). It is the largest and most populous city in the metropolitan area of Gush Dan, home to 3.15\u00a0million people as of 2008. The city is governed by the Tel Aviv-Yafo municipality, headed by Ron Huldai.\nTel Aviv was founded in 1909 on the outskirts of the ancient port city of Jaffa (Hebrew: \u05d9\u05b8\u05e4\u05d5\u05b9\u200e, Yafo; Arabic: \u064a\u0627\u0641\u0627\u200e, Yaffa). The growth of Tel Aviv soon outpaced Jaffa, which was largely Arab at the time. Tel Aviv and Jaffa were merged into a single municipality in 1950, two years after the establishment of the State of Israel. Tel Aviv's White City, designated a UNESCO World Heritage Site in 2003, comprises the world's largest concentration of Modernist-style buildings.\nTel Aviv is classified as a beta+...");

Result isn't correct, it doesn't really make much of a difference, a few letters are changed to greek/russian and not to Hebrew/Arabic.

结果不正确,实际上并没有太大区别,一些字母被更改为希腊语/俄语而不是希伯来语/阿拉伯语。

Its like the entity number is incorrect.

它就像实体编号不正确。