Python 向 pyspark Dataframe 添加新行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52685609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Add new rows to pyspark Dataframe
提问by Roushan
Am very new pyspark but familiar with pandas. I have a pyspark Dataframe
我很新 pyspark 但熟悉熊猫。我有一个 pyspark 数据框
# instantiate Spark
spark = SparkSession.builder.getOrCreate()
# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
(1, 2, 0),
(2, 0, 1)
]
# create DataFrame
df = spark.createDataFrame(vals, columns)
wanted to add new Row (4,5,7) so it will output:
想要添加新的 Row (4,5,7) 所以它会输出:
df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
| 1| 2| 0|
| 2| 0| 1|
| 4| 5| 7|
+---+----+----+
回答by cronoik
As thebluephantomhas already said union is the way to go. I'm just answering your question to give you a pyspark example:
正如thebluephantom已经说过的那样联合是要走的路。我只是回答你的问题给你一个pyspark的例子:
# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()
columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]
df = spark.createDataFrame(vals, columns)
newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()
Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html
还请查看 databricks 常见问题解答:https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html
回答by thebluephantom
From something I did, using union, showing a block partial coding - you need to adapt of course to your own situation:
从我所做的事情中,使用union,显示块部分编码 - 当然你需要适应你自己的情况:
val dummySchema = StructType(
StructField("phrase", StringType, true) :: Nil)
var dfPostsNGrams2 = spark.createDataFrame(sc.emptyRDD[Row], dummySchema)
for (i <- i_grams_Cols) {
val nameCol = col({i})
dfPostsNGrams2 = dfPostsNGrams2.union(dfPostsNGrams.select(explode({nameCol}).as("phrase")).toDF )
}
union of DF with itself is the way to go.
DF与自身的结合是要走的路。