macos 如何确定 OS X 中的文件编码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/539294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 06:02:47  来源:igfitidea点击:

How do I determine file encoding in OS X?

macosencodinglatexutf-8

提问by James A. Rosen

I'm trying to enter some UTF-8 characters into a LaTeX file in TextMate(which says its default encoding is UTF-8), but LaTeX doesn't seem to understand them.

我试图在TextMate 中的 LaTeX 文件中输入一些 UTF-8 字符(它说它的默认编码是 UTF-8),但 LaTeX 似乎不理解它们。

Running cat my_file.texshows the characters properly in Terminal. Running ls -alshows something I've never seen before: an "@" by the file listing:

运行cat my_file.tex在终端中正确显示字符。运行ls -al显示了我以前从未见过的东西:文件列表中的“@”:

-rw-r--r--@  1 me      users      2021 Feb 11 18:05 my_file.tex

(And, yes, I'm using \usepackage[utf8]{inputenc}in the LaTeX.)

(而且,是的,我\usepackage[utf8]{inputenc}在 LaTeX 中使用。)

I've found iconv, but that doesn't seem to be able to tell me what the encoding is -- it'll only convert once I figure it out.

我找到了iconv,但这似乎无法告诉我编码是什么——它只会在我弄清楚后才进行转换。

采纳答案by codelogic

The @means that the file has extended file attributes associated with it. You can query them using the getxattr()function.

@意味着该文件具有与其关联的扩展文件属性。您可以使用该getxattr()函数查询它们。

There's no definite way to detect the encoding of a file. Read thisanswer, it explains why.

没有确定的方法来检测文件的编码。阅读这个答案,它解释了原因。

There's a command line tool, enca, that attempts to guess the encoding. You might want to check it out.

有一个命令行工具enca会尝试猜测编码。你可能想检查一下。

回答by Tim

Using the -I(that's a capital i) option on the file command seems to show the file encoding.

-I在 file 命令上使用(这是一个大写的 i)选项似乎显示了文件编码。

file -I {filename}

回答by Cloudranger

In Mac OS X the command file -I(capital i) will give you the proper character set so long as the file you are testing contains characters outside of the basic ASCII range.

在 Mac OS X 中,file -I只要您正在测试的文件包含基本 ASCII 范围之外的字符,命令(大写 i)就会为您提供正确的字符集。

For instance if you go into Terminal and use vi to create a file eg. vi test.txtthen insert some characters and include an accented character (try ALT-e followed by e) then save the file.

例如,如果您进入终端并使用 vi 创建一个文件,例如。vi test.txt然后插入一些字符并包含一个重音字符(尝试 ALT-e 后跟 e)然后保存文件。

They type file -I text.txtand you should get a result like this:

他们打字file -I text.txt,你应该得到这样的结果:

test.txt: text/plain; charset=utf-8

test.txt: text/plain; charset=utf-8

回答by jmettraux

vim -c 'execute "silent !echo " . &fileencoding | q' {filename}

aliased somewhere in my bash configuration as

在我的 bash 配置中的某处别名为

alias vic="vim -c 'execute \"silent !echo \" . &fileencoding | q'"

so I just type

所以我只是输入

vic {filename}

On my vanilla OSX Yosemite, it yields more precise results than "file -I":

在我的原版 OSX Yosemite 上,它比“file -I”产生更精确的结果:

$ file -I pdfs/udocument0.pdf
pdfs/udocument0.pdf: application/pdf; charset=binary
$ vic pdfs/udocument0.pdf
latin1
$
$ file -I pdfs/t0.pdf
pdfs/t0.pdf: application/pdf; charset=us-ascii
$ vic pdfs/t0.pdf
utf-8

回答by RPM

You can also convert from one file type to another using the following command :

您还可以使用以下命令从一种文件类型转换为另一种文件类型:

iconv -f original_charset -t new_charset originalfile > newfile

e.g.

例如

iconv -f utf-16le -t utf-8 file1.txt > file2.txt

回答by bx2

Just use:

只需使用:

file -I <filename>

That's it.

而已。

回答by Adam

Using filecommand with the --mime-encodingoption (e.g. file --mime-encoding some_file.txt) instead of the -I option works on OS X and has the added benefit of omitting the mime type, "text/plain", which you probably don't care about.

使用file带有--mime-encoding选项的命令(例如file --mime-encoding some_file.txt)而不是 -I 选项适用于 OS X,并且具有省略 mime 类型“text/plain”的额外好处,您可能不关心它。

回答by Will Robertson

Classic 8-bit LaTeX is very restricted in which UTF8 characters it can use; it's highly dependent on the encoding of the font you're using and which glyphs that font has available.

经典的 8 位 LaTeX 非常受限于它可以使用的 UTF8 字符;它高度依赖于您使用的字体的编码以及该字体可用的字形。

Since you don't give a specific example, it's hard to know exactly where the problem is — whether you're attempting to use a glyph that your font doesn't have or whether you're not using the correct font encoding in the first place.

由于您没有给出具体示例,因此很难确切知道问题出在哪里——您是否尝试使用您的字体没有的字形,或者您是否在第一个中没有使用正确的字体编码地方。

Here's a minimal example showing how a few UTF8 characters can be used in a LaTeX document:

这是一个最小示例,显示了如何在 LaTeX 文档中使用几个 UTF8 字符:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[utf8]{inputenc}
\begin{document}
‘Héll?—thêrè.'
\end{document}

You may have more luck with the [utf8x] encoding, but be slightly warned that it's no longer supported and has some idiosyncrasies compared with [utf8] (as far as I recall; it's been a while since I've looked at it). But if it does the trick, that's all that matters for you.

您可能对 [utf8x] 编码更幸运,但请注意,它不再受支持,并且与 [utf8] 相比有一些特点(据我所知;我已经有一段时间没有看过它了)。但如果它成功了,那对你来说才是最重要的。

回答by Jouni K. Sepp?nen

The @ sign means the file has extended attributes. xattr fileshows what attributes it has, xattr -l fileshows the attribute values too (which can be large sometimes — try e.g. xattr /System/Library/Fonts/HelveLTMMto see an old-style font that exists in the resource fork).

@ 符号表示该文件具有扩展属性xattr file显示它有什么属性,也xattr -l file显示属性值(有时可能很大——例如尝试xattr /System/Library/Fonts/HelveLTMM查看资源分支中存在的旧式字体)。

回答by dreamlax

Typing file myfile.texin a terminal can sometimes tell you the encoding and type of file using a series of algorithms and magic numbers. It's fairly useful but don't rely on it providing concrete or reliable information.

file myfile.tex在终端中输入有时可以使用一系列算法和幻数告诉您文件的编码和类型。它相当有用,但不要依赖它提供具体或可靠的信息。

A Localizable.stringsfile (found in localised Mac OS X applications) is typically reported to be a UTF-16 C source file.

Localizable.strings文件(在局部Mac OS X应用发现)通常报道为一个UTF-16 C源文件。