java 为什么在hadoop中检查文件是否存在会导致NullPointerException?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4727901/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 07:44:57  来源:igfitidea点击:

Why does checking whether a file exists in hadoop cause a NullPointerException?

javahadoop

提问by jonderry

I'm trying to create or open a file to store some output in HDFS, but I'm getting a NullPointerException when I call the existsmethod in the second to last line of the code snippet below:

我正在尝试创建或打开一个文件以在 HDFS 中存储一些输出,但是当我exists在下面的代码片段的倒数第二行中调用该方法时,我收到了 NullPointerException :

DistributedFileSystem dfs = new DistributedFileSystem();
Path path = new Path("/user/hadoop-user/bar.txt");
if (!dfs.exists(path)) dfs.createNewFile(path);
FSDataOutputStream dos = dfs.create(path);

Here is the stack trace:

这是堆栈跟踪:

java.lang.NullPointerException
        at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:80)
        at ClickViewSessions$ClickViewSessionsMapper.map(ClickViewSessions.java:65)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

What could the problem be?

可能是什么问题?

回答by bajafresh4life

I think the preferred way of doing this is:

我认为这样做的首选方法是:

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://mynamenodehost:9000");
FileSystem fs = FileSystem.get(conf);
Path path = ...

That way you don't tie your code to a particular implementation of FileSystem; plus you don't have to worry about how each implementation of FileSystem is initialized.

这样您就不会将代码绑定到 FileSystem 的特定实现;此外,您不必担心 FileSystem 的每个实现是如何初始化的。

回答by Oleg Ryaboy

The default constructor DistributedFileSystem() does not perform initialization; you need to call dfs.initialize() explicitly.

默认构造函数 DistributedFileSystem() 不执行初始化;您需要显式调用 dfs.initialize()。

The reason you are getting a null pointer exception is that the DistributedFileSystem internally uses an instance of DFSClient. Since you did not call initialize(), the instance of DFSClient is null. getFileStatus() calls dfsClient.getFileInfo(getPathName(f) - which causes NullPointerException, since dfsClient is null.

您收到空指针异常的原因是 DistributedFileSystem 在内部使用 DFSClient 的实例。由于您没有调用 initialize(),因此 DFSClient 的实例为 null。getFileStatus() 调用 dfsClient.getFileInfo(getPathName(f) - 这会导致 NullPointerException,因为 dfsClient 为 null。

See https://trac.declarativity.net/browser/src/hdfs/org/apache/hadoop/dfs/DistributedFileSystem.java?rev=3593

https://trac.declarativity.net/browser/src/hdfs/org/apache/hadoop/dfs/DistributedFileSystem.java?rev=3593

回答by Rocky111

This shoud work

这应该工作

DistributedFileSystem dfs = new DistributedFileSystem();
dfs.initialize(new URI("URI to HDFS"), new Configuration());
Path path = new Path("/user/hadoop-user/bar.txt");
if (!dfs.exists(path)) dfs.createNewFile(path);
FSDataOutputStream dos = dfs.create(path);