bash 如何检测上传的csv文件的编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18636675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 06:28:26  来源:igfitidea点击:

how to detect encoding of uploaded csv file

phplinuxbashcsvcharacter-encoding

提问by Tony-M

I`ve have data.csvfile, that must be uploaded to server , parsed ....

我有data.csv文件,必须上传到服务器,解析....

This file can have different encodings. I must detect it and convert to utf8.

这个文件可以有不同的编码。我必须检测它并转换为 utf8。

At this moment phpfunction mb_detect_encodingalways return utf8. i tryed:

此时php函数 mb_detect_encoding总是返回utf8。我试过:

<?php 
mb_detect_encoding(file_get_contents($_FILES["csv_uploadfile"]["tmp_name"]));

or

或者

<?php 
mb_detect_encoding(file_get_contents($saved_file_path));

mb_detect_encoding returns utf8.

mb_detect_encoding 返回 utf8。

if i use bash command

如果我使用 bash 命令

$ file -bi csv_import_1378376486.csv |awk -F "=" '{print }'

it rerurns iso-8859-1

它重新生成 iso-8859-1

so when i try

所以当我尝试

iconv --from-code=iso-8859-1 --to-code=utf-8 csv_import_1378382527.csv 

it is not readable.

它不可读。

The real encoding is cp1251, by i cann`t detect it.Can anyone help me to solve this problem?

真正的编码是 cp1251,我无法检测到它。谁能帮我解决这个问题?

回答by Kleskowy

As someone noticed in the PHP docs here:

正如有人在此处的 PHP 文档中注意到的那样:

If you try to use mb_detect_encoding() to detect whether a string is valid UTF-8, use the strict mode, it is pretty worthless otherwise.

如果您尝试使用 mb_detect_encoding() 来检测字符串是否为有效的 UTF-8,请使用严格模式,否则将毫无价值。

So you should try using the trueparam when detecting encoding:

所以你应该true在检测编码时尝试使用参数:

mb_detect_encoding($str, mb_detect_order(), TRUE);

If you can predict some possible encodings, you can list them instead of using mb_detect_order().

如果您可以预测一些可能的编码,则可以列出它们而不是使用 mb_detect_order().