php fwrite() 和 UTF8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6336586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
fwrite() and UTF8
提问by Lizard
I am creating a file using php fwrite() and I know all my data is in UTF8 ( I have done extensive testing on this - when saving data to db and outputting on normal webpage all work fine and report as utf8.), but I am being told the file I am outputting contains non utf8 data :( Is there a command in bash (CentOS) to check the format of a file?
我正在使用 php fwrite() 创建一个文件,我知道我的所有数据都在 UTF8 中(我已经对此进行了广泛的测试 - 将数据保存到 db 并在普通网页上输出时一切正常并报告为 utf8。),但我被告知我输出的文件包含非 utf8 数据:( bash (CentOS) 中是否有命令来检查文件的格式?
When using vim it shows the content as:
使用 vim 时,它显示的内容为:
Dona~@~Yt do anything .... Ita~@~Ys a great site with everything....Wea~@~Yve only just launched/
Dona~@~Yt 做任何事情....Ita~@~Ys 一个很棒的网站,拥有一切......Wea~@~Yve 刚刚推出/
Any help would be appreciated: Either confirming the file is UTF8 or how to write utf8 content to a file.
任何帮助将不胜感激:确认文件是 UTF8 或如何将 utf8 内容写入文件。
UPDATE
更新
To clarify how I know I have data in UTF8 i have done the following:
为了澄清我如何知道我在 UTF8 中有数据,我做了以下工作:
- DB is set to utf8 When saving data
to database I run this first:
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "UTF-8", $enc);
Just before I run fwrite i have checked the data with Note each piece of data returns 'IS utf-8'
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'NOT UTF-8'; else print 'IS utf-8';
- DB设置为utf8 保存数据时
到数据库我先运行这个:
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "UTF-8", $enc);
就在我运行 fwrite 之前,我检查了数据,注意每条数据都返回“IS utf-8”
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'NOT UTF-8'; else print 'IS utf-8';
Thanks!
谢谢!
采纳答案by Lizard
The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM
我唯一要做的就是在 CSV 中添加一个 UTF8 BOM,数据是正确的,但是文件阅读器(外部应用程序)在没有 BOM 的情况下无法正确读取文件
回答by Florin Sima
If you know the data is in UTF8 than you want to set up the header.
如果您知道数据是 UTF8 格式,那么您想设置标题。
I wrote a solution answering to another tread.
我写了一个解决方案来回答另一个步骤。
The solution is the following: As the UTF-8 byte-order mark is \xef\xbb\xbf
we should add it to the document's header.
解决方案如下: 由于 UTF-8 字节顺序标记是\xef\xbb\xbf
我们应该将其添加到文档的标题中。
<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$file; // this is what makes the magic
fputs($f, $string);
fclose($f);
}
?>
You can adapt it to your code, basically you just want to make sure that you write a UTF8 file (as you said you know your content is UTF8 encoded).
您可以将其调整为您的代码,基本上您只想确保编写一个 UTF8 文件(正如您所说,您知道您的内容是 UTF8 编码的)。
回答by hakre
fwrite()
is not binary safe. That means, that your data - be it correctly encoded or not - might get mangled by this command or it's underlying routines.
fwrite()
不是二进制安全的。这意味着,您的数据 - 无论是否正确编码 - 可能会被此命令或其底层例程破坏。
To be on the safe side, you should use fopen()
with the binary mode flag. that's b
. Afterwards, fwrite()
will safe your string data "as-is", and that is in PHP until now binary data, because strings in PHP are binary strings.
为了安全起见,您应该使用fopen()
二进制模式标志。那是b
。之后,fwrite()
将“按原样”保护您的字符串数据,这在 PHP 中一直是二进制数据,因为 PHP 中的字符串是二进制字符串。
Background:Some systems differ between text and binary data. The binary flag will explicitly command PHP on such systems to use the binary output. When you deal with UTF-8 you should take care that the data does not get's mangeled. That's prevented by handling the string data as binary data.
背景:某些系统在文本和二进制数据之间存在差异。二进制标志将在此类系统上明确命令 PHP 使用二进制输出。当您处理 UTF-8 时,您应该注意数据不会被破坏。这是通过将字符串数据作为二进制数据处理来防止的。
However:If it's not like you told in your question that the UTF-8 encoding of the data is preserved, than your encoding got broken and even binary safe handling will keep the broken status. However, with the binary flag you still ensure that this is not the fwrite()
part of your application that is breaking things.
但是:如果不像您在问题中所说的那样保留了数据的 UTF-8 编码,那么您的编码就会被破坏,即使是二进制安全处理也会保持破坏状态。但是,使用二进制标志,您仍然可以确保这不是fwrite()
您的应用程序中破坏事物的部分。
It has been rightfully written in another answer here, that you do not know the encoding if you have data only. However, you can validate data if it validates UTF-8 encoding or not, so giving you at least some chance to check the encoding. A function in PHP which does this I've posted in a UTF-8 releated question so it might be of use for you if you need to debug things: Answer to: SimpleXML and Chineselook for can_be_valid_utf8_statemachine, that's the name of the function.
在这里的另一个答案中正确地写道,如果您只有数据,则您不知道编码。但是,您可以验证数据是否验证 UTF-8 编码,因此至少给您一些机会检查编码。PHP 中执行此操作的函数 我已在 UTF-8 相关问题中发布,因此如果您需要调试,它可能对您有用:答案:SimpleXML 和中文查找can_be_valid_utf8_statemachine,这是函数的名称。
回答by Artefacto
The problem is your data is double encoded. I assume your original text is something like:
问题是您的数据是双重编码的。我假设你的原文是这样的:
Don't do anything
with '
, i.e., not the straight apostrophe, but the right single quotation mark.
with '
,即不是直撇号,而是正确的单引号。
If you write a PHP script with this content and encoded in UTF-8:
如果您使用此内容编写 PHP 脚本并以 UTF-8 编码:
<?php
//File in UTF-8
echo utf8_encode("Don't"); //this will double encode
You will get something similar to your output.
你会得到类似于你的输出的东西。
回答by Du Peng
//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));
回答by steffanjj
$handle = fopen($file,"w");
fwrite($handle, pack("CCC",0xef,0xbb,0xbf));
fwrite($handle,$file);
fclose($handle);
回答by OZ_
I know all my data is in UTF8
- wrong.
Encoding it's not the format of a file. So, check charset in headers of the page, where you taking data from:header("Content-type: text/html; charset=utf-8;");
And check if data really in multi-byte encoding:if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'not UTF-8';
else print 'utf-8';
I know all my data is in UTF8
- 错误的。
编码它不是文件的格式。因此,请检查页面标题中的字符集,从中获取数据:header("Content-type: text/html; charset=utf-8;");
并检查数据是否真的采用多字节编码:if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'not UTF-8';
else print 'utf-8';
回答by mohamed isam
Try this simple method that is more useful and add to the top of the page before tag <body>
:
试试这个更有用的简单方法,并在标签之前添加到页面顶部<body>
:
<head>
<meta charset="utf-8">
</head>