Python 向 pyspark Dataframe 添加新行

Question

提问by Roushan

Am very new pyspark but familiar with pandas. I have a pyspark Dataframe

我很新 pyspark 但熟悉熊猫。我有一个 pyspark 数据框

# instantiate Spark
spark = SparkSession.builder.getOrCreate()

# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
     (1, 2, 0),
     (2, 0, 1)
]

# create DataFrame
df = spark.createDataFrame(vals, columns)

wanted to add new Row (4,5,7) so it will output:

想要添加新的 Row (4,5,7) 所以它会输出：

df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
|  1|   2|   0|
|  2|   0|   1|
|  4|   5|   7|
+---+----+----+

Answer 1

回答by cronoik

As thebluephantomhas already said union is the way to go. I'm just answering your question to give you a pyspark example:

正如thebluephantom已经说过的那样联合是要走的路。我只是回答你的问题给你一个pyspark的例子：

# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()

columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]

df = spark.createDataFrame(vals, columns)

newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()

Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

还请查看 databricks 常见问题解答：https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

Answer 2

回答by thebluephantom

From something I did, using union, showing a block partial coding - you need to adapt of course to your own situation:

从我所做的事情中，使用union，显示块部分编码 - 当然你需要适应你自己的情况：

val dummySchema = StructType(
StructField("phrase", StringType, true) :: Nil)
var dfPostsNGrams2 = spark.createDataFrame(sc.emptyRDD[Row], dummySchema)
for (i <- i_grams_Cols) {
    val nameCol = col({i})
    dfPostsNGrams2 = dfPostsNGrams2.union(dfPostsNGrams.select(explode({nameCol}).as("phrase")).toDF )
 }

union of DF with itself is the way to go.

DF与自身的结合是要走的路。

Python 向 pyspark Dataframe 添加新行

提问by Roushan

回答by cronoik

回答by thebluephantom

相关推荐

最近更新

标签

Python 向 pyspark Dataframe 添加新行

提问by Roushan

回答by cronoik

回答by thebluephantom

相关推荐

Visual Studio 代码窗口，Python Pandas。没有名为 pandas 的模块

如何在 Python/Django 中将字典列表转换为 JSON？

重新安装操作系统后，使用 virtualenv 在 PyCharm 项目中“无法设置 Python SDK”

在 Windows 中使用 python 和 Anaconda

相关推荐

最近更新

标签