Python 使用 pyspark 覆盖火花输出

Question

提问by Devesh

I am trying to overwrite a Spark dataframe using the following option in PySpark but I am not successful

我正在尝试使用 PySpark 中的以下选项覆盖 Spark 数据帧，但我没有成功

spark_df.write.format('com.databricks.spark.csv').option("header", "true",mode='overwrite').save(self.output_file_path)

the mode=overwrite command is not successful

mode=overwrite 命令不成功

Answer 1

回答by

Try:

尝试：

spark_df.write.format('com.databricks.spark.csv') \
  .mode('overwrite').option("header", "true").save(self.output_file_path)

Answer 2

回答by Davos

Spark 1.4 and above has a built in csv function for the dataframewriter

Spark 1.4 及更高版本具有用于 dataframewriter 的内置 csv 函数

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter

e.g.

例如

spark_df.write.csv(path=self.output_file_path, header="true", mode="overwrite", sep="\t")

Which is syntactic sugar for

哪个是语法糖

spark_df.write.format("csv").mode("overwrite").options(header="true",sep="\t").save(path=self.output_file_path)

I think what is confusing is finding where exactly the options are available for each format in the docs.

我认为令人困惑的是找到文档中每种格式的确切可用选项。

These write related methods belong to the DataFrameWriterclass: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter

这些写相关的方法属于这个DataFrameWriter类：https: //spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter

The csvmethod has these options available, also available when using format("csv"): https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.csv

该csv方法有这些选项可用，使用时也可用format("csv")：https: //spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.csv

The way you need to supply parameters also depends on if the method takes a single (key, value)tuple or keyword args. It's fairly standard to the way python works generally though, using (*args, **kwargs), it just differs from the Scala syntax.

您需要提供参数的方式还取决于该方法是否采用单个(key, value)元组或关键字 args。尽管使用 (*args, **kwargs)，但它与 Python 的一般工作方式相当标准，它只是与 Scala 语法不同。

For example The option(key, value)method takes one option as a tuple like option(header,"true")and the .options(**options)method takes a bunch of keyword assignments e.g. .options(header="true",sep="\t")

例如，该option(key, value)方法采用一个选项作为元组，option(header,"true")并且该.options(**options)方法采用一堆关键字赋值，例如.options(header="true",sep="\t")

Python 使用 pyspark 覆盖火花输出

提问by Devesh

回答by

回答by Davos

相关推荐

最近更新

标签

Python 使用 pyspark 覆盖火花输出

提问by Devesh

回答by

回答by Davos

相关推荐

python将带有行和列标题的csv文件读入带有两个键的字典

Python 导入模块不起作用

Python 在 tkinter (TkAgg) 中使用 Matplotlib

python替换json文件中的值

相关推荐

最近更新

标签