Java 解决错误：编码UTF8的字符不可映射

Question

提问by user2604052

I have a maven project, the character encoding is set as UTF-8 in my parent pom.

我有一个 Maven 项目，在我的父 pom 中字符编码设置为 UTF-8。

    <plugin>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.3.2</version>
      <configuration>
        <source>1.7</source>
        <target>1.7</target>
        <encoding>UTF-8</encoding>
      </configuration>
    </plugin>

But in the Java file, some characters like ` orhas been used and it is causing compilation error to me.

但是在Java文件中，使用了一些像这样的字符` or，这给我造成了编译错误。

In the Eclipse (Properties----Resource -----Text File encoding and Windows--preferences---workspace---text file encoding), I have specified the encoding as UTF-8. Please let me know how this issue can be solved.

在Eclipse（属性----资源-----文本文件编码和Windows--首选项---工作区---文本文件编码）中，我将编码指定为UTF-8。请让我知道如何解决这个问题。

PERL CODE TO DO CONVERSION STUFF

进行转换的 Perl 代码

use strict;
use warnings;
use File::Find;
use open qw/:std :utf8/;

my $dir = "D:\files";


find({ wanted => \&collectFiles}, "$dir");

sub collectFiles {
    my $filename = $_;
        if($filename =~ /.java$/){
        #print $filename."\n";
        startConversion($filename);
    }
}

sub startConversion{
    my $filename = $_;
    print $filename."\n";
    open(my $INFILE,  '<:encoding(cp1252)',  $filename) or die $!;
    open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;
}

Answer 1

回答by amon

These two lines do not start or perform re-encoding:

这两行不启动或执行重新编码：

open(my $INFILE,  '<:encoding(cp1252)',  $filename) or die $!;
open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;

Opening a file with >truncates it, which deletes the content. See the opendocumentationfor further details.

打开文件会>被截断，从而删除内容。有关更多详细信息，请参阅open文档。

Rather, you have to read the data from the first file (which automatically decodes it), and write it back to another file (which automatically encodes it). Because source and target file are identical here, and because of the quirks of file handling under Windows, we should write our output to a temp file:

相反，您必须从第一个文件中读取数据（它会自动对其进行解码），然后将其写回另一个文件（它会自动对其进行编码）。因为这里的源文件和目标文件是相同的，并且由于 Windows 下文件处理的怪癖，我们应该将输出写入临时文件：

use autodie;  # automatic error handling :)

open my $in,  '<:encoding(cp1252)', $filename;
open my $out, '>:encoding(UTF-8)', "$filename~";  # or however you'd like to call the tempfile
print {$out} $_ while <$in>;  # copy the file, recoding it
close $_ for $in, $out;

rename "$filename~" => $filename;  # BEWARE: doesn't work across logival volumes!

If the files are small enough (hint: source code usually is), then you could also load them into memory:

如果文件足够小（提示：源代码通常是），那么您也可以将它们加载到内存中：

use File::Slurp;

my $contents = read_file $filename, { binmode => ':encoding(cp1252)' };
write_file $filename, { binmode => ':encoding(UTF-8)' }, $contents;

Answer 2

回答by David W.

If you're on Linux or Mac OS X, you can use iconvto convert files to UTF-8. Java 1.7 does not allow for non-utf8 characters, but Java 1.6 does (although it produces a warning). I know because I have Java 1.7 on my Mac, and I can't compile some of our code because of this while Windows users and our Linux continuous build machine can because they both still use Java 1.6.

如果您使用的是 Linux 或 Mac OS X，则可以使用iconv将文件转换为 UTF-8。Java 1.7 不允许使用非 utf8 字符，但 Java 1.6 允许（尽管它会产生警告）。我知道是因为我的 Mac 上有 Java 1.7，因此我无法编译我们的一些代码，而 Windows 用户和我们的 Linux 连续构建机器可以，因为他们仍然使用 Java 1.6。

The problem with your Perl script is that you're opening a file for reading and the same file for writing, but you're using the same file name. When you open the file for writing, you are deleting its contents.

您的 Perl 脚本的问题在于，您打开一个文件进行读取并打开同一个文件进行写入，但是您使用的是相同的文件名。当您打开文件进行写入时，您正在删除其内容。

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

use File::Find;

use strict;
use warnings;
use autodie;

use constant  {
    SOURCE_DIR       => 'src',
};


my @file_list;
find {
    next unless -f;
    next unless /\.java$/;
    push $file_list, $File::Find::name;
}, SOURCE_DIR;

for my $file ( @file_list ) {
    open my $file_fh, "<:encoding(cp1252)", $file;
    my @file_contents = <$file_fh>;
    close $file_fh;

    open my $file_fh, ">:encoding(utf8)", $file;
    print {$file_fh} @file_contents;
    close $file_fh;
}

Note I am reading the entire file into memory which should be okay with Java source code. Even a gargantuan source file (10,000 lines long with an average line length of 120 characters) will be just over 1.2 megabytes. Unless you're using a TRS-80, I a 1.2 megabyte file shouldn't be a memory issue. If you want to be strict about it, use File::Tempto create a temporary file to write to, and then use File::Copyto rename that temporary file. Both are standard Perl modules.

注意我正在将整个文件读入内存，这对于 Java 源代码应该没问题。即使是一个庞大的源文件（10,000 行，平均行长为 120 个字符）也只有 1.2 兆字节多一点。除非您使用的是 TRS-80，否则 1.2 兆字节的文件不应该是内存问题。如果您想对其严格一点，请使用File::Temp来创建要写入的临时文件，然后使用File::Copy重命名该临时文件。两者都是标准的 Perl 模块。

You can also enclosed the entire program in the findsubroutine too.

您也可以将整个程序包含在find子程序中。

Java 解决错误：编码UTF8的字符不可映射

提问by user2604052

回答by amon

回答by David W.

相关推荐

最近更新

标签

Java 解决错误：编码UTF8的字符不可映射

提问by user2604052

回答by amon

回答by David W.

相关推荐

Java 在 BouncyCastle 上实现带有数字签名算法 (ECDSA) 的椭圆曲线

java.sql.SQLException：无效的列名

Java Swing revalidate() 与 repaint()

Java 如何生成基于时间的 UUID？

相关推荐

最近更新

标签