php 如何在字符串中转换 Word 智能引号和破折号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/175785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I convert Word smart quotes and em dashes in a string?
提问by GloryFish
I have a form with a textarea. Users enter a block of text which is stored in a database.
我有一个带有 textarea 的表单。用户输入存储在数据库中的文本块。
Occasionally a user will paste text from Word containing smart quotes or emdashes. Those characters appear in the database as: a“, a?, a? ,a
有时,用户会粘贴 Word 中包含智能引号或短划线的文本。这些字符在数据库中显示为:a“、a?、a? ,一种
What function should I call on the input string to convert smart quotes to regular quotes and emdashes to regular dashes?
我应该在输入字符串上调用什么函数来将智能引号转换为常规引号并将 emdashes 转换为常规破折号?
I am working in PHP.
我在 PHP 工作。
Update: Thanks for all of the great responses so far. The page on Joel's site about encodings is very informative: http://www.joelonsoftware.com/articles/Unicode.html
更新:感谢您到目前为止的所有精彩回复。Joel 网站上关于编码的页面非常有用:http: //www.joelonsoftware.com/articles/Unicode.html
Some notes on my environment:
关于我的环境的一些说明:
The MySQL database is using UTF-8 encoding. Likewise, the HTML pages that display the content are using UTF-8 (Update:) by explicitly setting the meta content-type.
MySQL 数据库使用 UTF-8 编码。同样,显示内容的 HTML 页面通过显式设置元内容类型使用 UTF-8(更新:)。
On those pages the smart quotes and emdashes appear as a diamond with question mark.
在这些页面上,智能引号和短划线显示为带问号的菱形。
Solution:
解决方案:
Thanks again for the responses. The solution was twofold:
再次感谢您的答复。解决方案是双重的:
- Make sure the database and HTML files were explicitly set to use UTF-8 encoding.
- Use
htmlspecialchars()instead ofhtmlentities().
- 确保将数据库和 HTML 文件明确设置为使用 UTF-8 编码。
- 使用
htmlspecialchars()代替htmlentities()。
采纳答案by theraccoonbear
This sounds like a Unicode issue. Joel Spolsky has a good jumping off point on the topic: http://www.joelonsoftware.com/articles/Unicode.html
这听起来像是 Unicode 问题。Joel Spolsky 在这个主题上有一个很好的起点:http: //www.joelonsoftware.com/articles/Unicode.html
回答by Ates Goral
The mysql database is using UTF-8 encoding. Likewise, the html pages that display the content are using UTF-8.
mysql 数据库使用 UTF-8 编码。同样,显示内容的 html 页面也使用 UTF-8。
The content of the HTML can be in UTF-8, yes, but are you explicitly setting the content type (encoding) of your HTML pages (generated via PHP?) to UTF-8 as well? Try returning a Content-Typeheader of "text/html;charset=utf-8"or add <meta>tags to your HTMLs:
HTML 的内容可以是 UTF-8,是的,但是您是否也将 HTML 页面(通过 PHP 生成?)的内容类型(编码)显式设置为 UTF-8?尝试返回HTML的Content-Type标头"text/html;charset=utf-8"或向<meta>HTML 中添加标签:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
That way, the content type of the data submitted to PHP will also be the same.
这样,提交给 PHP 的数据的内容类型也将相同。
I had a similar issue and adding the <meta>tag worked for me.
我有一个类似的问题,添加<meta>标签对我有用。
回答by Kip
It sounds like the real problem is that your database is not using the same character encoding as your page (which should probably be UTF-8). In that case, if any user submits a non-ASCII character you'll probably see weird characters in the database. Finding and fixing just a few of them (curly quotes and em dashes) isn't going to solve the real problem.
听起来真正的问题是您的数据库使用的字符编码与您的页面不同(应该是 UTF-8)。在这种情况下,如果任何用户提交非 ASCII 字符,您可能会在数据库中看到奇怪的字符。仅查找和修复其中的一些(卷曲引号和长破折号)并不能解决真正的问题。
Here is some info on migrating your database to another character encoding, at least for a MySQL database.
这里有一些关于将数据库迁移到另一种字符编码的信息,至少对于 MySQL 数据库是这样。
回答by ConroyP
This is an unfortunately all-too-common problem, not helped by PHP's very poor handling of character sets.
不幸的是,这是一个非常普遍的问题,PHP 对字符集的处理非常糟糕,这对它没有帮助。
What we do is force the text through iconv
我们所做的是强制文本通过 iconv
// Convert input data to UTF8, ignore any odd (MS Word..) chars
// that don't translate
$input = iconv("ISO-8859-1","UTF-8//IGNORE",$input);
The //IGNOREflag means that anything that can't be translated will be thrown away.
该//IGNORE标志意味着什么,不能转换会被扔掉。
If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded.
如果附加字符串 //IGNORE,则无法在目标字符集中表示的字符将被静默丢弃。
回答by Patrick McElhaney
In my experience, it's easier to just accept the smart quotes and make sure you're using the same encoding everywhere. To start, add this to your form tag: accept-charset="utf-8"
根据我的经验,接受智能引号并确保在任何地方都使用相同的编码会更容易。首先,将其添加到您的表单标签中:accept-charset="utf-8"
回答by Greg
You could try mb_ convert_encodingfrom ISO-8859-1 to UTF-8.
您可以尝试将mb_ convert_encoding从 ISO-8859-1 转换为 UTF-8。
$str = mb_convert_encoding($str, 'UTF-8', 'ISO-8859-1');
This assumes you want UTF-8, and convert can find reasonable replacements... if not, mb_str_replace or preg_replace them yourself.
这假设您想要 UTF-8,并且 convert 可以找到合理的替代品……如果没有,请自行 mb_str_replace 或 preg_replace 它们。
回答by mspmsp
We would often use standard string replace functions for that. Even though the nature of ASCII/Unicode in that context is pretty murky, it works. Just make sure your php file is saved in the right encoding format, etc.
为此,我们通常会使用标准的字符串替换函数。尽管该上下文中 ASCII/Unicode 的性质非常模糊,但它确实有效。只需确保您的 php 文件以正确的编码格式保存,等等。
回答by Joeri Sebrechts
You have to be sure your database connection is configured to accept and provide UTF-8 from and to the client (otherwise it will convert to the "default", which is usually latin1).
您必须确保您的数据库连接配置为从客户端接受和提供 UTF-8(否则它将转换为“默认”,通常是 latin1)。
In practice this means running a query SET NAMES 'utf8';
实际上,这意味着运行查询 SET NAMES 'utf8';
http://www.phpwact.org/php/i18n/utf-8/mysql
http://www.phpwact.org/php/i18n/utf-8/mysql
Also, smart quotes are part of the windows-1252 character set, not iso-8859-1 (latin-1). Not very relevant to your problem, but just FYI. The euro symbol is in there as well.
此外,智能引号是 windows-1252 字符集的一部分,而不是 iso-8859-1 (latin-1)。与您的问题不太相关,但仅供参考。欧元符号也在那里。
回答by hawshy
the problem is on the mysql charset, I fixed my issues with this line of code.
问题出在 mysql 字符集上,我用这行代码解决了我的问题。
mysql_set_charset('utf8',$link);
回答by Dazbert
You have to manually change the collation of individual columns to UTF8; changing the database overall won't alter these.
您必须手动将各个列的排序规则更改为 UTF8;整体更改数据库不会改变这些。

