在 PHP 中将 ASCII 转换为纯文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10595691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert ASCII to plaintext in PHP
提问by e_r
I am scraping some sites, and have ASCII text that I want to convert to plain text for storing in a DB. For example I want
我正在抓取一些网站,并且有 ASCII 文本,我想将其转换为纯文本以存储在数据库中。例如我想要
I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen.
It's almost impossible to convey how pumped I am
now that I've seen it.
converted to
转换成
I have got to tell anyone who will listen that this is
one of THE best adventure movies I've ever seen. It's
almost impossible to convey how pumped I am now that
I've seen it.
I have googled my fingers bloody, any help?
我用谷歌搜索了我的手指,有什么帮助吗?
回答by ash108
You can use html_entity_decode:
您可以使用html_entity_decode:
echo html_entity_decode('...', ENT_QUOTES, 'UTF-8');
Few notes:
几点注意事项:
Please note that it looks like you actually want to convert from HTML-encoded string(with entities like
) to ASCII AKA plaintext.This example converts to UTF-8which is ASCII-compatible character encoding for all ASCII characters (i.e. with char codes below 128). If you really want plain ASCII (thus loosing all accented characters and characters from foreign languages) you should strip all offending characters separately.
Last argument ('UTF-8') is necessary to keep compatibility with different PHP versions since the default value has changed since PHP 5.4.0.
请注意,看起来您实际上想要从 HTML 编码的字符串(带有像 的实体
)转换为 ASCII AKA 纯文本。此示例转换为UTF-8,这是所有 ASCII 字符(即字符代码低于 128)的 ASCII 兼容字符编码。如果您真的想要纯 ASCII(从而删除所有重音字符和外语字符),您应该单独删除所有有问题的字符。
最后一个参数 ('UTF-8') 是保持与不同 PHP 版本兼容所必需的,因为自 PHP 5.4.0 以来默认值已更改。
Update: Example with your text in ideone.
更新:以 ideone 中的文本为例。
Update2: Changed ENT_COMPAT to ENT_QUOTES by @Daan's suggestion.
更新 2:根据@Daan 的建议将 ENT_COMPAT 更改为 ENT_QUOTES。

