Java 执行 BufferedReader.close() 时 Hadoop 文件系统关闭异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20057881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 22:45:12  来源:igfitidea点击:

Hadoop FileSystem closed exception when doing BufferedReader.close()

javahadoopmapreducehdfs

提问by Venk K

From within the Reduce setup method,I am trying to close a BufferedReaderobject and getting a FileSystemclosed exception. It does not happen all the time. This is the piece of code I used to create the BufferedReader.

在 Reduce 设置方法中,我试图关闭一个BufferedReader对象并获得一个FileSystem关闭的异常。它不会一直发生。这是我用来创建BufferedReader.

    String fileName = <some HDFS file path>
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    Path hdfsPath = new Path(filename);
    FSDataInputStream in = fs.open(hdfsPath);
    InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
    BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

I read contents from the bufferedReader and once all the reading is done, I close it.

我从 bufferedReader 读取内容,一旦所有读取完成,我就关闭它。

This is the piece of code that reads it

这是读取它的代码

String line;
while ((line = reader.readLine()) != null) {
    // Do something
}

This the piece of code that closes the reader.

这是关闭阅读器的一段代码。

    if (bufferedReader != null) {
        bufferedReader.close();
    }

This is the stack trace for the exception that happens when I do a bufferedReader.close().

这是我执行bufferedReader.close().

I, [2013-11-18T04:56:51.601135 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)

I, [2013-11-18T04:56:51.601168 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)

I, [2013-11-18T04:56:51.601199 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.FilterInputStream.close(FilterInputStream.java:155)

I, [2013-11-18T04:56:51.601230 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)

I, [2013-11-18T04:56:51.601263 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)

I, [2013-11-18T04:56:51.601356 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.InputStreamReader.close(InputStreamReader.java:182)

I, [2013-11-18T04:56:51.601395 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.BufferedReader.close(BufferedReader.java:497)

我,[2013-11-18T04:56:51.601135 #25683] 信息--:attempt_201310111840_142285_r_000009_0:在 org.apache.hadoop.hdfs.DFSClient.checkOpen:(DFS5)

I, [2013-11-18T04:56:51.601168 #25683] 信息 -- : 尝试_201310111840_142285_r_000009_0: 在 org.apache.hadoop.hdfs.DFSInputStream.java:5StreamInputStream.close(DFS2InputStream.close)

I, [2013-11-18T04:56:51.601199 #25683] 信息--:尝试_201310111840_142285_r_000009_0:在java.io.FilterInputStream.close(FilterInputStream.java:155)

I, [2013-11-18T04:56:51.601230 #25683] 信息--:尝试_201310111840_142285_r_000009_0:在 sun.nio.cs.StreamDecoder.implClose(StreamDecoder).java

I, [2013-11-18T04:56:51.601263 #25683] 信息--:尝试_201310111840_142285_r_000009_0:在sun.nio.cs.StreamDecoder.close(StreamDecoder).java:

I, [2013-11-18T04:56:51.601356 #25683] 信息--:尝试_201310111840_142285_r_000009_0:在java.io.InputStreamReader.close(InputStreamReader.java:182)

我,[2013-11-18T04:56:51.601395 #25683] 信息 -- :attempt_201310111840_142285_r_000009_0:在 java.io.BufferedReader.close(BufferedReader.java:497)

I am not sure why this exception is happening. This is not multithreaded and so, I do not expect there to be a race condition of any sort. Can you please help me understand.

我不确定为什么会发生此异常。这不是多线程的,所以我不希望有任何类型的竞争条件。你能帮我理解一下吗。

Thanks,

谢谢,

Venk

文克

回答by Joe K

There is a little-known gotcha with the hadoop filesystem API: FileSystem.getreturns the same object for every invocation with the same filesystem. So if one is closed anywhere, they are all closed. You could debate the merits of this decision, but that's the way it is.

hadoop 文件系统 API 有一个鲜为人知的问题:FileSystem.get使用相同的文件系统为每次调用返回相同的对象。因此,如果在任何地方关闭,它们都将关闭。你可以争论这个决定的优点,但事实就是这样。

So, if you attempt to close your BufferedReader, and it tries to flush out some data it has buffered, but the underlying stream is connected to a FileSystem that is already closed, you'll get this error. Check your code for any other places you are closing a FileSystem object, and look for race conditions. Also, I believe Hadoop itself will at some point close the FileSystem, so to be safe, you should probably only be accessing it from within the Reducer's setup, reduce, or cleanup methods (or configure, reduce, and close, depending on which API you're using).

因此,如果您尝试关闭 BufferedReader,并且它尝试清除已缓冲的一些数据,但底层流已连接到已关闭的文件系统,您将收到此错误。检查您关闭 FileSystem 对象的任何其他位置的代码,并查找竞争条件。此外,我相信 Hadoop 本身会在某个时候关闭文件系统,所以为了安全起见,您可能应该只从 Reducer 的设置、减少或清理方法(或配置、减少和关闭,具体取决于哪个 API)中访问它你正在使用)。

回答by Marius Soutier

You have to use FileSystem.newInstanceto avoid using a shared connection (as described by Joe K). It will give you a unique, non-shared instance.

您必须FileSystem.newInstance避免使用共享连接(如 Joe K 所述)。它将为您提供一个独特的、非共享的实例。