macos MATLAB:如何显示从文件中读取的 UTF-8 编码文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6863147/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 08:15:06  来源:igfitidea点击:

MATLAB: how to display UTF-8-encoded text read from file?

macosuser-interfacematlabunicodeutf-8

提问by kjo

The gist of my question is this:

我的问题的要点是:

How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?

如何在 Matlab 的 GUI (OS X) 中显示 Unicode 字符以便正确呈现它们?

Details:

细节:

I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:

我有一个存储在文件中的字符串表,其中一些字符串包含 UTF-8 编码的 Unicode 字符。我尝试了许多不同的方法(太多了,无法在此列出)在 MATLAB GUI 中显示此文件的内容,但没有成功。例如:

>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc

enc =

UTF-8

>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}

ans =

????????£?|???±?2?3?′?μ???·???1?o??????????????????
>> 

As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:

碰巧的是,如果我将字符串直接粘贴到 MATLAB GUI 中,则粘贴的字符串可以正常显示,这表明 GUI 并非根本无法显示这些字符,但是一旦 MATLAB 读入它,它就可以正确显示它。例如:

>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω'

pasted =


>> 

Thanks!

谢谢!

回答by Amro

I present below my findings after doing some digging... Consider these test files:

在做了一些挖掘之后,我在下面展示了我的发现......考虑这些测试文件:

a.txt

一个.txt

ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω

b.txt

b.txt

?????

First, we read files:

首先,我们读取文件:

%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')';             %'# read bytes
fclose(fid);

%# decode as unicode string
str = native2unicode(b,'UTF-8');

If you try to print the string, you get a bunch of nonsense:

如果你尝试打印字符串,你会得到一堆废话:

>> str
str =

Nonetheless, strdoes hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):

尽管如此,str确实保存了正确的字符串。我们可以检查每个字符的 Unicode 代码,正如您在 ASCII 范围之外所看到的一样(最后两个是不可打印的 CR-LF 行尾):

>> double(str)
ans =
  Columns 1 through 13
   915   916   920   923   926   928   931   934   937   945   946   947   948
  Columns 14 through 26
   949   950   951   952   953   954   955   956   957   958   960   961   962
  Columns 27 through 35
   963   964   965   966   967   968   969    13    10

Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:

不幸的是,MATLAB 似乎无法自行在 GUI 中显示此 Unicode 字符串。例如,所有这些都失败了:

figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)

One trick I found is to use the embedded Java capability:

我发现的一个技巧是使用嵌入式 Java 功能:

%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);

enter image description here

在此处输入图片说明



As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSetundocumented feature and set the charset to UTF-8(on my machine, it is ISO-8859-1by default):

当我准备写上面的内容时,我找到了一个替代解决方案。我们可以使用DefaultCharacterSet未记录的功能并将字符集设置为UTF-8(在我的机器上,ISO-8859-1默认情况下是):

feature('DefaultCharacterSet','UTF-8');

Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):

现在有了合适的字体(您可以从 更改命令行窗口中使用的字体Preferences > Font),我们可以在提示中打印字符串(注意 DISP 仍然无法打印 Unicode):

>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω

>> disp(str)
?“?”?????????£?|???±?2?3?′?μ???·???1?o?????????????????…???????‰

And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):

为了在 GUI 中显示它,UICONTROL 应该可以工作(在幕后,我认为它确实是一个 Java Swing 组件):

uicontrol('Style','text', 'String',str, ...
    'Units','normalized', 'Position',[0 0 1 1], ...
    'FontName','Arial Unicode MS', 'FontSize',30)

enter image description here

在此处输入图片说明

Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:

不幸的是,TEXT、TITLE、XLABEL 等仍然显示垃圾:

enter image description here

在此处输入图片说明



As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

附带说明:在 MATLAB 编辑器中很难处理包含 Unicode 字符的 m 文件源。我使用的是Notepad++,文件编码为UTF-8 而没有 BOM