Java 如何将数据附加到现有的镶木地板文件

Question

提问by Krishas

I'm using the following code to create ParquetWriter and to write records to it.

我正在使用以下代码创建 ParquetWriter 并向其写入记录。

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);

parquetWriter.write(record);

But it only allows to create new files(at the specfied path). Is there a way to append data to an existing parquet file (at path)? Caching parquetWriter is not feasible in my case.

但它只允许创建新文件（在指定的路径）。有没有办法将数据附加到现有的镶木地板文件（在路径）？在我的情况下，缓存 parquetWriter 是不可行的。

Answer 1

回答by vgunnu

Parquet is a columnar file, It optimizes writing all columns together. If any edit it requires to rewrite the file.

Parquet 是一个柱状文件，它优化了将所有列一起写入。如果有任何编辑，则需要重写文件。

From Wiki

来自维基

A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. For our example table, the data would be stored in this fashion:

面向列的数据库将一列的所有值序列化在一起，然后是下一列的值，依此类推。对于我们的示例表，数据将以这种方式存储：

10:001,12:002,11:003,22:004;
Smith:001,Jones:002,Johnson:003,Jones:004;
Joe:001,Mary:002,Cathy:003,Bob:004;
40000:001,50000:002,44000:003,55000:004;

Some links

一些链接

https://en.wikipedia.org/wiki/Column-oriented_DBMS

https://parquet.apache.org/

Answer 2

回答by bluszcz

There is a Spark API SaveMode called append: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.htmlwhich I believe solves your problem.

有一个名为 append 的 Spark API SaveMode：https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.html我相信它可以解决您的问题。

Example of use:

使用示例：

df.write.mode('append').parquet('parquet_data_file')

Java 如何将数据附加到现有的镶木地板文件

提问by Krishas

回答by vgunnu

回答by bluszcz

相关推荐

最近更新

标签

Java 如何将数据附加到现有的镶木地板文件

提问by Krishas

回答by vgunnu

回答by bluszcz

相关推荐

如何在 Java 密钥库中导入现有的 X.509 证书和私钥以在 SSL 中使用？

如何在 Java 7 中启用 TLS 1.2

Java Hibernate 命名策略更改表名

Java Jstack 并且没有足够的存储空间来处理此命令

相关推荐

最近更新

标签