如何在 Java/Scala 中跳过流中的无效字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7280956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to skip invalid characters in stream in Java/Scala?
提问by yura
For example I have following code
例如我有以下代码
Source.fromFile(new File( path), "UTF-8").getLines()
and it throws exception
它抛出异常
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
I don't care if some lines were not read, but how to skip invalid chars and continue reading lines?
我不在乎某些行是否未被读取,但是如何跳过无效字符并继续读取行?
回答by Joachim Sauer
You can influence the way that the charset decoding handles invalid input by calling CharsetDecoder.onMalformedInput.
您可以通过调用来影响字符集解码处理无效输入的方式CharsetDecoder.onMalformedInput。
Usuallyyou won't ever see a CharsetDecoderobject directly, because it will be created behind the scenes for you. So if you need access to it, you'll need to use API that allows you to specify the CharsetDecoderdirectly (instead of just the encoding name or the Charset).
通常你永远不会CharsetDecoder直接看到一个对象,因为它会在幕后为你创建。因此,如果您需要访问它,则需要使用允许您CharsetDecoder直接指定 的 API (而不仅仅是编码名称或Charset)。
The most basic example of such API is the InputStreamReader:
此类 API 的最基本示例是InputStreamReader:
InputStream in = ...;
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
Reader reader = new InputStreamReader(in, decoder);
Note that this code uses the Java 7 class StandardCharsets, for earlier versions you can simply replace it with Charset.forName("UTF-8")(or use the Charsetsclassfrom Guava).
请注意,此代码使用 Java 7 类StandardCharsets,对于早期版本,您可以简单地将其替换为Charset.forName("UTF-8")(或使用来自Guava的Charsets类)。
回答by Daniel C. Sobral
Well, if it isn't UTF-8, it is something else. The trick is finding out what that something else is, but if all you want is avoid the errors, you can use an encoding that doesn't have invalid codes, such as latin1:
好吧,如果它不是 UTF-8,那就是别的东西了。诀窍是找出其他东西是什么,但如果您只想避免错误,您可以使用没有无效代码的编码,例如latin1:
Source.fromFile(new File( path), "latin1").getLines()
回答by Assaf Israel
I had a similar issue, and one of Scala's built-in codecs did the trick for me:
我有一个类似的问题,Scala 的一个内置编解码器帮我解决了这个问题:
Source.fromFile(new File(path))(Codec.ISO8859).getLines()
回答by canadiancreed
If you want to avoid invalid characters using Scala, I found this worked for me.
如果你想避免使用 Scala 的无效字符,我发现这对我有用。
import java.nio.charset.CodingErrorAction
import scala.io._
object HelloWorld {
def main(args: Array[String]) = {
implicit val codec = Codec("UTF-8")
codec.onMalformedInput(CodingErrorAction.REPLACE)
codec.onUnmappableCharacter(CodingErrorAction.REPLACE)
val dataSource = Source.fromURL("https://www.foo.com")
for (line <- dataSource.getLines) {
println(line)
}
}
}

