Java 解决错误:编码UTF8的字符不可映射
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18252095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Solving error: unmappable character for encoding UTF8
提问by user2604052
I have a maven project, the character encoding is set as UTF-8 in my parent pom.
我有一个 Maven 项目,在我的父 pom 中字符编码设置为 UTF-8。
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
But in the Java file, some characters like ` or
has been used and it is causing compilation error to me.
但是在Java文件中,使用了一些像这样的字符` or
,这给我造成了编译错误。
In the Eclipse (Properties----Resource -----Text File encoding and Windows--preferences---workspace---text file encoding), I have specified the encoding as UTF-8. Please let me know how this issue can be solved.
在Eclipse(属性----资源-----文本文件编码和Windows--首选项---工作区---文本文件编码)中,我将编码指定为UTF-8。请让我知道如何解决这个问题。
PERL CODE TO DO CONVERSION STUFF
进行转换的 Perl 代码
use strict;
use warnings;
use File::Find;
use open qw/:std :utf8/;
my $dir = "D:\files";
find({ wanted => \&collectFiles}, "$dir");
sub collectFiles {
my $filename = $_;
if($filename =~ /.java$/){
#print $filename."\n";
startConversion($filename);
}
}
sub startConversion{
my $filename = $_;
print $filename."\n";
open(my $INFILE, '<:encoding(cp1252)', $filename) or die $!;
open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;
}
回答by amon
These two lines do not start or perform re-encoding:
这两行不启动或执行重新编码:
open(my $INFILE, '<:encoding(cp1252)', $filename) or die $!;
open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;
Opening a file with >
truncates it, which deletes the content. See the open
documentationfor further details.
打开文件会>
被截断,从而删除内容。有关更多详细信息,请参阅open
文档。
Rather, you have to read the data from the first file (which automatically decodes it), and write it back to another file (which automatically encodes it). Because source and target file are identical here, and because of the quirks of file handling under Windows, we should write our output to a temp file:
相反,您必须从第一个文件中读取数据(它会自动对其进行解码),然后将其写回另一个文件(它会自动对其进行编码)。因为这里的源文件和目标文件是相同的,并且由于 Windows 下文件处理的怪癖,我们应该将输出写入临时文件:
use autodie; # automatic error handling :)
open my $in, '<:encoding(cp1252)', $filename;
open my $out, '>:encoding(UTF-8)', "$filename~"; # or however you'd like to call the tempfile
print {$out} $_ while <$in>; # copy the file, recoding it
close $_ for $in, $out;
rename "$filename~" => $filename; # BEWARE: doesn't work across logival volumes!
If the files are small enough (hint: source code usually is), then you could also load them into memory:
如果文件足够小(提示:源代码通常是),那么您也可以将它们加载到内存中:
use File::Slurp;
my $contents = read_file $filename, { binmode => ':encoding(cp1252)' };
write_file $filename, { binmode => ':encoding(UTF-8)' }, $contents;
回答by David W.
If you're on Linux or Mac OS X, you can use iconv
to convert files to UTF-8. Java 1.7 does not allow for non-utf8 characters, but Java 1.6 does (although it produces a warning). I know because I have Java 1.7 on my Mac, and I can't compile some of our code because of this while Windows users and our Linux continuous build machine can because they both still use Java 1.6.
如果您使用的是 Linux 或 Mac OS X,则可以使用iconv
将文件转换为 UTF-8。Java 1.7 不允许使用非 utf8 字符,但 Java 1.6 允许(尽管它会产生警告)。我知道是因为我的 Mac 上有 Java 1.7,因此我无法编译我们的一些代码,而 Windows 用户和我们的 Linux 连续构建机器可以,因为他们仍然使用 Java 1.6。
The problem with your Perl script is that you're opening a file for reading and the same file for writing, but you're using the same file name. When you open the file for writing, you are deleting its contents.
您的 Perl 脚本的问题在于,您打开一个文件进行读取并打开同一个文件进行写入,但是您使用的是相同的文件名。当您打开文件进行写入时,您正在删除其内容。
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
use File::Find;
use strict;
use warnings;
use autodie;
use constant {
SOURCE_DIR => 'src',
};
my @file_list;
find {
next unless -f;
next unless /\.java$/;
push $file_list, $File::Find::name;
}, SOURCE_DIR;
for my $file ( @file_list ) {
open my $file_fh, "<:encoding(cp1252)", $file;
my @file_contents = <$file_fh>;
close $file_fh;
open my $file_fh, ">:encoding(utf8)", $file;
print {$file_fh} @file_contents;
close $file_fh;
}
Note I am reading the entire file into memory which should be okay with Java source code. Even a gargantuan source file (10,000 lines long with an average line length of 120 characters) will be just over 1.2 megabytes. Unless you're using a TRS-80, I a 1.2 megabyte file shouldn't be a memory issue. If you want to be strict about it, use File::Temp
to create a temporary file to write to, and then use File::Copy
to rename that temporary file. Both are standard Perl modules.
注意我正在将整个文件读入内存,这对于 Java 源代码应该没问题。即使是一个庞大的源文件(10,000 行,平均行长为 120 个字符)也只有 1.2 兆字节多一点。除非您使用的是 TRS-80,否则 1.2 兆字节的文件不应该是内存问题。如果您想对其严格一点,请使用File::Temp
来创建要写入的临时文件,然后使用File::Copy
重命名该临时文件。两者都是标准的 Perl 模块。
You can also enclosed the entire program in the find
subroutine too.
您也可以将整个程序包含在find
子程序中。