Linux 找出给定字体支持哪些字符

Question

提问by Till Ulen

How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?

如何从 Linux 上的 TrueType 或嵌入式 OpenType 字体中提取支持的 Unicode 字符列表？

Is there a tool or a library I can use to process a .ttf or a .eot file and build a list of code points (like U+0123, U+1234, etc.) provided by the font?

是否有工具或库可用于处理 .ttf 或 .eot 文件并构建字体提供的代码点列表（如 U+0123、U+1234 等）？

Answer 1

采纳答案by Janus Troelsen

Here is a method using the FontToolsmodule (which you can install with something like pip install fonttools):

这是使用FontTools模块的方法（您可以使用类似的东西安装pip install fonttools）：

#!/usr/bin/env python
from itertools import chain
import sys

from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode

ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
                ignoreDecompileErrors=True,
                fontNumber=-1)

chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
print(list(chars))

# Use this for just checking if the font contains the codepoint given as
# second argument:
#char = int(sys.argv[2], 0)
#print(Unicode[char])
#print(char in (x[0] for x in chars))

ttf.close()

The script takes as argument the font path?:

该脚本将字体路径作为参数？：

python checkfont.py /path/to/font.ttf

Answer 2

回答by hippietrail

You can do this on Linux in Perl using the Font::TTFmodule.

您可以使用Font::TTF模块在 Perl 中的 Linux 上执行此操作。

Answer 3

回答by wschang

The character code points for a ttf/otf font are stored in the CMAPtable.

ttf/otf 字体的字符代码点存储在CMAP表中。

You can use ttxto generate a XML representation of the CMAPtable. see here.

您可以使用它ttx来生成CMAP表的 XML 表示。看到这里。

You can run the command ttx.exe -t cmap MyFont.ttfand it should output a file MyFont.ttx. Open it in a text editor and it should show you all the character code it found in the font.

您可以运行该命令ttx.exe -t cmap MyFont.ttf，它应该会输出一个文件MyFont.ttx。在文本编辑器中打开它，它应该会显示它在字体中找到的所有字符代码。

Answer 4

回答by ecmanaut

I just had the same problem, and made a HOWTOthat goes one step further, baking a regexp of all the supported Unicode code points.

我刚刚遇到了同样的问题，并制作了一个更进一步的HOWTO，烘焙了所有支持的 Unicode 代码点的正则表达式。

If you just want the array of codepoints, you can use this when peeking at your ttxxml in Chrome devtools, after running ttx -t cmap myfont.ttfand, probably, renaming myfont.ttxto myfont.xmlto invoke Chrome's xml mode:

如果你只是想码点的阵列，你可以在你偷看时使用ttx的镀铬devtools XML，运行后ttx -t cmap myfont.ttf和可能，重新命名myfont.ttx，以myfont.xml调用浏览器的XML模式：

function codepoint(node) { return Number(node.nodeValue); }
$x('//cmap/*[@platformID="0"]/*/@code').map(codepoint);

(Also relies on fonttoolsfrom gilamesh's suggestion; sudo apt-get install fonttoolsif you're on an ubuntu system.)

（也依赖于fonttoolsgilamesh 的建议；sudo apt-get install fonttools如果您使用的是 ubuntu 系统。）

Answer 5

回答by nim

fc-query my-font.ttfwill give you a map of supported glyphs and all the locales the font is appropriate for according to fontconfig

fc-query my-font.ttf将根据 fontconfig为您提供支持的字形图和字体适合的所有语言环境

Since pretty much all modern linux apps are fontconfig-based this is much more useful than a raw unicode list

由于几乎所有现代 linux 应用程序都是基于 fontconfig 的，因此这比原始 unicode 列表有用得多

The actual output format is discussed here http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

实际的输出格式在这里讨论 http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

Answer 6

回答by Spencer

The Linux program xfd can do this. It's provided in my distro as 'xorg-xfd'. To see all characters for a font, you can run this in terminal:

Linux 程序 xfd 可以做到这一点。它在我的发行版中作为“xorg-xfd”提供。要查看字体的所有字符，您可以在终端中运行：

xfd -fa "DejaVu Sans Mono"

Answer 7

回答by deceleratedcaviar

If you ONLY want to "view" the fonts, the following might be helpful (if your terminal supports the font in question):

如果您只想“查看”字体，以下内容可能会有所帮助（如果您的终端支持相关字体）：

#!/usr/bin/env python
import sys
from fontTools.ttLib import TTFont

with TTFont(sys.argv[1], 0, ignoreDecompileErrors=True) as ttf:
    for x in ttf["cmap"].tables:
        for (_, code) in x.cmap.items():
            point = code.replace('uni', '\u').lower()
            print("echo -e '" + point + "'")

An unsafe, but easy way to view:

一种不安全但简单的查看方式：

python font.py my-font.ttf | sh

Thanks to Janus (https://stackoverflow.com/a/19438403/431528) for the answer above.

感谢 Janus ( https://stackoverflow.com/a/19438403/431528) 提供上述答案。

Answer 8

回答by Neil Mayhew

The fontconfigcommands can output the glyph list as a compact list of ranges, eg:

这些fontconfig命令可以将字形列表输出为范围的紧凑列表，例如：

$ fc-match --format='%{charset}\n' OpenSans
20-7e a0-17f 192 1a0-1a1 1af-1b0 1f0 1fa-1ff 218-21b 237 2bc 2c6-2c7 2c9
2d8-2dd 2f3 300-301 303 309 30f 323 384-38a 38c 38e-3a1 3a3-3ce 3d1-3d2 3d6
400-486 488-513 1e00-1e01 1e3e-1e3f 1e80-1e85 1ea0-1ef9 1f4d 2000-200b
2013-2015 2017-201e 2020-2022 2026 2030 2032-2033 2039-203a 203c 2044 2070
2074-2079 207f 20a3-20a4 20a7 20ab-20ac 2105 2113 2116 2120 2122 2126 212e
215b-215e 2202 2206 220f 2211-2212 221a 221e 222b 2248 2260 2264-2265 25ca
fb00-fb04 feff fffc-fffd

Use fc-queryfor a .ttffile and fc-matchfor an installed font name.

使用fc-query的.ttf文件和fc-match已安装的字体名称。

This likely doesn't involve installing any extra packages, and doesn't involve translating a bitmap.

这可能不涉及安装任何额外的包，也不涉及转换位图。

Use fc-match --format='%{file}\n'to check whether the right font is being matched.

使用fc-match --format='%{file}\n'检查正确的字体是否被匹配。

Answer 9

回答by zhk_tiger

The above Janus's answer (https://stackoverflow.com/a/19438403/431528) works. But python is too slow, especially for Asian fonts. It costs minutes for a 40MB file size font on my E5 computer.

以上 Janus 的回答（https://stackoverflow.com/a/19438403/431528）有效。但是python太慢了，尤其是亚洲字体。在我的 E5 计算机上使用 40MB 文件大小的字体需要几分钟。

So I write a little C++ program to do that. It is depends on FreeType2(https://www.freetype.org/). It is a vs2015 project, but it is easy to port to linux for it is a console application.

所以我写了一个小 C++ 程序来做到这一点。它取决于 FreeType2( https://www.freetype.org/)。它是一个 vs2015 项目，但很容易移植到 linux，因为它是一个控制台应用程序。

Code can be found here, https://github.com/zhk/AllCodePointsFor the 40MB file size Asian font, it costs about 30 ms on my E5 computer.

代码可以在这里找到，https://github.com/zhk/AllCodePoints对于 40MB 文件大小的亚洲字体，在我的 E5 计算机上花费大约 30 毫秒。

Answer 10

回答by brunoob

If you want to get all characters supported by a font, you may use the following (based on Janus's answer)

如果您想获得字体支持的所有字符，您可以使用以下内容（基于 Janus 的回答）

from fontTools.ttLib import TTFont

def get_font_characters(font_path):
    with TTFont(font_path) as font:
        characters = {chr(y[0]) for x in font["cmap"].tables for y in x.cmap.items()}
    return characters

Linux 找出给定字体支持哪些字符

提问by Till Ulen

采纳答案by Janus Troelsen

回答by hippietrail

回答by wschang

回答by ecmanaut

回答by nim

回答by Spencer

回答by deceleratedcaviar

回答by Neil Mayhew

回答by zhk_tiger

回答by brunoob

相关推荐

最近更新

标签

Linux 找出给定字体支持哪些字符

提问by Till Ulen

采纳答案by Janus Troelsen

回答by hippietrail

回答by wschang

回答by ecmanaut

回答by nim

回答by Spencer

回答by deceleratedcaviar

回答by Neil Mayhew

回答by zhk_tiger

回答by brunoob

相关推荐

在 Linux 中将 IP 映射到主机名

Linux 来自用户空间的连续物理内存

C# IEnumerable 和 Array、IList 和 List 有什么区别？

C# 控制台应用程序 + 事件处理

相关推荐

最近更新

标签