Notepad++ 只是称为“ANSI”的编码，有没有人知道 Ruby 该怎么称呼它？

Question

提问by Owen_R

I have a bunch of .txt's that Notepad++ says (in its drop-down "Encoding" menu) are "ANSI".

我有一堆 .txt 的 Notepad++ 说（在它的下拉“编码”菜单中）是“ANSI”。

They have German characters in them, [??ü?], which display fine in Notepad++.

它们中有德语字符，[??ü?]，在 Notepad++ 中显示良好。

But they don't show up right in irb when I File.read 'this is a German text example.txt'them.

但是当我File.read 'this is a German text example.txt'他们时，他们并没有出现在 irb 中。

So does anyone know what argument I should give Encoding.default_external=?

那么有人知道我应该给出什么论点Encoding.default_external=吗？

(I'm assuming that'd be the solution, right?)

（我假设这就是解决方案，对吗？）

When 'utf-8'or 'cp850', it reads the "ANSI" file with "??ü?" in it as "\xE4\xF6\xFC\xDF"...

当'utf-8'或时'cp850'，它读取带有“??ü?”的“ANSI”文件。在它作为“\xE4\xF6\xFC\xDF”...

(Please don't hesitate to mention apparently "obvious" things in your answers; I'm pretty much as newbish as you can be and still know just enough to ask this question.)

（请不要犹豫，在你的答案中提到明显“明显”的事情；我和你一样新手，但仍然知道足以提出这个问题。）

Answer 1

回答by J?rg W Mittag

What they mean is probably ISO/IEC 8859-1 (aka Latin-1), ISO-8859-1, ISO/IEC 8859-15 (aka Latin-9) or Windows-1252 (aka CP 1252). All 4 of them have the ?at position 0xE4.

它们的意思可能是 ISO/IEC 8859-1 (aka Latin-1)、ISO-8859-1、ISO/IEC 8859-15 (aka Latin-9) 或 Windows-1252 (aka CP 1252)。他们四个都有?at 位置0xE4。

Answer 2

回答by Love and peace - Joe Codeswell

I found the answer to this question on the Notepad++ Forum, answered in 2010 by CChris who seems to be authoritative.

我在 Notepad++ 论坛上找到了这个问题的答案，由 CChris 于 2010 年回答，似乎很权威。

Question:Encoding ANSI?

问题：编码ANSI？

Answer:

回答：

That will be the system code page for your computer (code page 0).

这将是您计算机的系统代码页（代码页 0）。

More Info:

更多信息：

Show your current code page.

显示您当前的代码页。

>help chcp
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

>chcp
Active code page: 437

Code Page Identifiers

代码页标识符

Identifier  .NET Name  Additional information
437         IBM437     OEM United States

Answer 3

回答by Owen_R

I think it's 'cp1252', alias 'windows-1252'.

我认为它是“cp1252”，别名“windows-1252”。

After reading J?rg's answer, I went back through the Encodingpage on ruby-doc.org trying to find references to the specific encodings he mentioned, and that's when I spotted the Encodings.aliasesmethod.

在阅读了 J?rg 的回答后，我返回了ruby-doc.org 上的编码页面，试图找到对他提到的特定编码的引用，这就是我发现该Encodings.aliases方法的时候。

So I kludged up the method at the end of this answer.

所以我在这个答案的最后总结了这个方法。

Then I looked at the output in notepad++, viewing it as both 'ANSI' and utf-8, and compared that to the output in irb...

然后我查看了 notepad++ 中的输出，将其视为“ANSI”和 utf-8，并将其与 irb 中的输出进行了比较...

I could only find two places in the irb output where the utf-8 file was garbled in the exact same way it appeared in notepad++ when viewing it as 'ANSI', and those places were for cp1252 and cp1254.

我只能在 irb 输出中找到两个地方，其中 utf-8 文件的乱码方式与将其视为“ANSI”时出现在记事本++中的方式完全相同，而这些地方分别用于 cp1252 和 cp1254。

cp1252 is apparently my 'filesystem' encoding, so I'm going with that.

cp1252 显然是我的“文件系统”编码，所以我要这样做。

I wrote a script to make copies of all the files converted to utf-8's, trying both from 1252 and 1254.

我编写了一个脚本来复制所有转换为 utf-8 的文件，从 1252 和 1254 开始尝试。

utf-8 regexes seem to work with both sets of files so far.

到目前为止，utf-8 正则表达式似乎适用于两组文件。

Now I have to try to remember what I was actually trying to accomplishbefore I ran into all these encoding headaches. xD

现在，在遇到所有这些编码难题之前，我必须努力记住我实际尝试完成的工作。xD

def compare_encodings file1, file2
    file1_probs = []
    file2_probs = []

    txt = File.open('encoding_test_output.txt','w')

    Encoding.aliases.sort.each do |k,v|
        Encoding.default_external=k
        ename = [k.downcase, v.downcase].join "  ---  "
        s = ""
        begin
            s << "#{File.read(file1)}" 
        rescue
            s << "nope nope nope"
            file1_probs << ename
        end
        s << "\t| #{ename} |\t"
        begin
            s << "#{File.read(file2)}"
        rescue
            s << "nope nope nope"
            file2_probs << ename
        end
        Encoding.default_external= 'utf-8'
        txt.puts s.center(58)
        puts s.center(58)
    end
    puts
    puts "file1, \"#{file1}\" exceptions from trying to convert to:\n\n"
    puts file1_probs
    puts
    puts "file2, \"#{file2}\" exceptions from trying to convert to:\n\n"
    puts file2_probs
    txt.close
end

compare_encodings "utf-8.txt", "np++'ANSI'.txt"

Notepad++ 只是称为“ANSI”的编码，有没有人知道 Ruby 该怎么称呼它？

提问by Owen_R

回答by J?rg W Mittag

回答by Love and peace - Joe Codeswell

回答by Owen_R

相关推荐

最近更新

标签

Notepad++ 只是称为“ANSI”的编码，有没有人知道 Ruby 该怎么称呼它？

提问by Owen_R

回答by J?rg W Mittag

回答by Love and peace - Joe Codeswell

回答by Owen_R

相关推荐

ruby 如何通过数字索引获取哈希值

Ruby - 将字符串转换为浮点数返回 0.0

运行名为“未指定 SDK”的 Ruby 代码时出错

如何在 Ruby 2.0 中使用调试器？

相关推荐

最近更新

标签