将数据附加到 HDFS Java 中的现有文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22997137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Append data to existing file in HDFS Java
提问by kennechu
I'm having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.
我无法将数据附加到 HDFS 中的现有文件。我希望如果文件存在,则附加一行,如果不存在,则创建一个具有给定名称的新文件。
Here's my method to write into HDFS.
这是我写入 HDFS 的方法。
if (!file.exists(path)){
file.createNewFile(path);
}
FSDataOutputStream fileOutputStream = file.append(path);
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream));
br.append("Content: " + content + "\n");
br.close();
Actually this method writes into HDFS and create a file but as I mention is not appending.
实际上,这种方法会写入 HDFS 并创建一个文件,但正如我所提到的,它没有附加。
This is how I test my method:
这是我测试我的方法的方式:
RunTimeCalculationHdfsWrite.hdfsWriteFile("RunTimeParserLoaderMapperTest2", "Error message test 2.2", context, null);
The first param is the name of the file, the second the message and the other two params are not important.
第一个参数是文件名,第二个是消息,其他两个参数不重要。
So anyone have an idea what I'm missing or doing wrong?
所以有人知道我错过了什么或做错了什么吗?
回答by Chaos
HDFS does not allow append
operations. One way to implement the same functionality as appending is:
HDFS 不允许append
操作。实现与附加相同功能的一种方法是:
- Check if file exists.
- If file doesn't exist, then create new file & write to new file
- If file exists, create a temporary file.
- Read line from original file & write that same line to temporary file (don't forget the newline)
- Write the lines you want to append to the temporary file.
- Finally, delete the original file & move(rename) the temporary file to the original file.
- 检查文件是否存在。
- 如果文件不存在,则创建新文件并写入新文件
- 如果文件存在,则创建一个临时文件。
- 从原始文件读取行并将同一行写入临时文件(不要忘记换行符)
- 写入要附加到临时文件的行。
- 最后,删除原始文件并将临时文件移动(重命名)到原始文件。
回答by Mikhail Golubtsov
Actually, you can append to a HDFS file:
实际上,您可以附加到 HDFS 文件:
From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.
从Client的角度来看,append操作首先调用DistributedFileSystem的append,这个操作会返回一个流对象FSDataOutputStream出来。如果Client需要向这个文件追加数据,可以调用out.write写,调用out.close关闭。
I checked HDFS sources, there is DistributedFileSystem#append
method:
我检查了 HDFS 源,有DistributedFileSystem#append
方法:
FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException
For details, see presentation.
有关详细信息,请参阅演示文稿。
Also you can append through command line:
您也可以通过命令行追加:
hdfs dfs -appendToFile <localsrc> ... <dst>
Add lines directly from stdin:
直接从 stdin 添加行:
echo "Line-to-add" | hdfs dfs -appendToFile - <dst>
回答by Lovish chaudhary
Solved..!!
解决了..!!
Append is supported in HDFS.
HDFS 支持追加。
You just have to do some configurations and simple code as shown below :
你只需要做一些配置和简单的代码,如下所示:
Step 1: set dfs.support.append as true in hdfs-site.xml :
第 1 步:在 hdfs-site.xml 中将 dfs.support.append 设置为 true:
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
Stop all your daemon services using stop-all.sh and restart it again using start-all.sh
使用 stop-all.sh 停止所有守护程序服务,然后使用 start-all.sh 重新启动它
Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :
第 2 步(可选):仅当您有单节点集群时,必须将复制因子设置为 1,如下所示:
Through command line :
通过命令行:
./hdfs dfs -setrep -R 1 filepath/directory
Or you can do the same at run time through java code:
或者您可以在运行时通过 java 代码执行相同的操作:
fsShell.setrepr((short) 1, filePath);
Step 3: Code for Creating/appending data into the file :
第 3 步:用于创建/附加数据到文件中的代码:
public void createAppendHDFS() throws IOException {
Configuration hadoopConfig = new Configuration();
hadoopConfig.set("fs.defaultFS", hdfsuri);
FileSystem fileSystem = FileSystem.get(hadoopConfig);
String filePath = "/test/doc.txt";
Path hdfsPath = new Path(filePath);
fShell.setrepr((short) 1, filePath);
FSDataOutputStream fileOutputStream = null;
try {
if (fileSystem.exists(hdfsPath)) {
fileOutputStream = fileSystem.append(hdfsPath);
fileOutputStream.writeBytes("appending into file. \n");
} else {
fileOutputStream = fileSystem.create(hdfsPath);
fileOutputStream.writeBytes("creating and writing into file\n");
}
} finally {
if (fileSystem != null) {
fileSystem.close();
}
if (fileOutputStream != null) {
fileOutputStream.close();
}
}
}
Kindly let me know for any other help.
请让我知道任何其他帮助。
Cheers.!!
干杯。!!