macos MATLAB:如何显示从文件中读取的 UTF-8 编码文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6863147/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MATLAB: how to display UTF-8-encoded text read from file?
提问by kjo
The gist of my question is this:
我的问题的要点是:
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
如何在 Matlab 的 GUI (OS X) 中显示 Unicode 字符以便正确呈现它们?
Details:
细节:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
我有一个存储在文件中的字符串表,其中一些字符串包含 UTF-8 编码的 Unicode 字符。我尝试了许多不同的方法(太多了,无法在此列出)在 MATLAB GUI 中显示此文件的内容,但没有成功。例如:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc
enc =
UTF-8
>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}
ans =
????????£?|???±?2?3?′?μ???·???1?o??????????????????
>>
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
碰巧的是,如果我将字符串直接粘贴到 MATLAB GUI 中,则粘贴的字符串可以正常显示,这表明 GUI 并非根本无法显示这些字符,但是一旦 MATLAB 读入它,它就可以正确显示它。例如:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω'
pasted =
>>
Thanks!
谢谢!
回答by Amro
I present below my findings after doing some digging... Consider these test files:
在做了一些挖掘之后,我在下面展示了我的发现......考虑这些测试文件:
a.txt
一个.txt
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω
b.txt
b.txt
?????
First, we read files:
首先,我们读取文件:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
If you try to print the string, you get a bunch of nonsense:
如果你尝试打印字符串,你会得到一堆废话:
>> str
str =
Nonetheless, str
does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
尽管如此,str
确实保存了正确的字符串。我们可以检查每个字符的 Unicode 代码,正如您在 ASCII 范围之外所看到的一样(最后两个是不可打印的 CR-LF 行尾):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
不幸的是,MATLAB 似乎无法自行在 GUI 中显示此 Unicode 字符串。例如,所有这些都失败了:
figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)
One trick I found is to use the embedded Java capability:
我发现的一个技巧是使用嵌入式 Java 功能:
%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet
undocumented feature and set the charset to UTF-8
(on my machine, it is ISO-8859-1
by default):
当我准备写上面的内容时,我找到了一个替代解决方案。我们可以使用DefaultCharacterSet
未记录的功能并将字符集设置为UTF-8
(在我的机器上,ISO-8859-1
默认情况下是):
feature('DefaultCharacterSet','UTF-8');
Now with a proper font (you can change the font used in the Command Window from Preferences > Font
), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
现在有了合适的字体(您可以从 更改命令行窗口中使用的字体Preferences > Font
),我们可以在提示中打印字符串(注意 DISP 仍然无法打印 Unicode):
>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρ?στυφχψω
>> disp(str)
?“?”?????????£?|???±?2?3?′?μ???·???1?o?????????????????…???????‰
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
为了在 GUI 中显示它,UICONTROL 应该可以工作(在幕后,我认为它确实是一个 Java Swing 组件):
uicontrol('Style','text', 'String',str, ...
'Units','normalized', 'Position',[0 0 1 1], ...
'FontName','Arial Unicode MS', 'FontSize',30)
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
不幸的是,TEXT、TITLE、XLABEL 等仍然显示垃圾:
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.
附带说明:在 MATLAB 编辑器中很难处理包含 Unicode 字符的 m 文件源。我使用的是Notepad++,文件编码为UTF-8 而没有 BOM。