bash 查找文本文件的编码

Question

提问by Hakim

I have a bunch of text files with different encodings. But I want to convert all of the into utf-8. since there are about 1000 files, I cant do it manually. I know that there are some commands in llinux which change the encodings of files from one encoding into another one. but my question is how to automatically detect the current encoding of a file? Clearly I'm looking for a command (say FindEncoding($File) ) to do this:

我有一堆不同编码的文本文件。但我想把所有的都转换成 utf-8。由于大约有 1000 个文件，我无法手动完成。我知道 llinux 中有一些命令可以将文件的编码从一种编码更改为另一种编码。但我的问题是如何自动检测文件的当前编码？显然我正在寻找一个命令（比如 FindEncoding($File) ）来做到这一点：

foreach file
do
$encoding=FindEncoding($File);
uconv -f $encoding -t utf-8 $file;
done

Answer 1

回答by J. Katzwinkel

I usually do sth like this:

我通常这样做：

for f in *.txt; do
    encoding=$(file -i "$f" | sed "s/.*charset=\(.*\)$//")
    recode $encoding..utf-8 "$f"
done

Note that recode will overwrite the file for changing the character encoding. If it is not possible to identify the text files by extension, their respective mime type can be determined with file -bi | cut -d ';' -f 1.

请注意，重新编码将覆盖文件以更改字符编码。如果无法通过扩展名识别文本文件，则可以使用file -bi | cut -d ';' -f 1.

It is also probably a good idea to avoid unnecessary re-encodings by checking on UFT-8 first:

通过首先检查 UFT-8 来避免不必要的重新编码也可能是一个好主意：

if [ ! "$encoding" = "utf-8" ]; then
    #encode

After this treatment, there might still be some files with an us-asciiencoding. The reason for that is ASCII being a subset of UTF-8 that remains in use unless any characters are introduced that are not expressible by ASCII. In that case, the encoding switches to UTF-8.

经过这种处理，可能仍然有一些带有us-ascii编码的文件。原因是 ASCII 是 UTF-8 的一个子集，除非引入了任何 ASCII 无法表达的字符，否则它仍然在使用。在这种情况下，编码会切换到 UTF-8。

bash 查找文本文件的编码

提问by Hakim

回答by J. Katzwinkel

相关推荐

最近更新

标签

bash 查找文本文件的编码

提问by Hakim

回答by J. Katzwinkel

相关推荐

bash 未找到 NGINX brew install 命令

Bash 脚本说找不到命令

bash 在 shell 脚本中回显一些命令行（对单个命令回显）

bash RVM 设置 ruby​​ 默认，当打开新终端时擦除 RVM 设置，回到系统 ruby

相关推荐

最近更新

标签

bash RVM 设置 ruby 默认，当打开新终端时擦除 RVM 设置，回到系统 ruby