bash 如何检测上传的csv文件的编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18636675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to detect encoding of uploaded csv file
提问by Tony-M
I`ve have data.csvfile, that must be uploaded to server , parsed ....
我有data.csv文件,必须上传到服务器,解析....
This file can have different encodings. I must detect it and convert to utf8.
这个文件可以有不同的编码。我必须检测它并转换为 utf8。
At this moment phpfunction mb_detect_encodingalways return utf8. i tryed:
此时php函数 mb_detect_encoding总是返回utf8。我试过:
<?php
mb_detect_encoding(file_get_contents($_FILES["csv_uploadfile"]["tmp_name"]));
or
或者
<?php
mb_detect_encoding(file_get_contents($saved_file_path));
mb_detect_encoding returns utf8.
mb_detect_encoding 返回 utf8。
if i use bash command
如果我使用 bash 命令
$ file -bi csv_import_1378376486.csv |awk -F "=" '{print }'
it rerurns iso-8859-1
它重新生成 iso-8859-1
so when i try
所以当我尝试
iconv --from-code=iso-8859-1 --to-code=utf-8 csv_import_1378382527.csv
it is not readable.
它不可读。
The real encoding is cp1251, by i cann`t detect it.Can anyone help me to solve this problem?
真正的编码是 cp1251,我无法检测到它。谁能帮我解决这个问题?
回答by Kleskowy
As someone noticed in the PHP docs here:
正如有人在此处的 PHP 文档中注意到的那样:
If you try to use mb_detect_encoding() to detect whether a string is valid UTF-8, use the strict mode, it is pretty worthless otherwise.
如果您尝试使用 mb_detect_encoding() 来检测字符串是否为有效的 UTF-8,请使用严格模式,否则将毫无价值。
So you should try using the true
param when detecting encoding:
所以你应该true
在检测编码时尝试使用参数:
mb_detect_encoding($str, mb_detect_order(), TRUE);
If you can predict some possible encodings, you can list them instead of using mb_detect_order()
.
如果您可以预测一些可能的编码,则可以列出它们而不是使用 mb_detect_order()
.