Python 如何从 Spark SQL 中的列表创建数据框？

Question

提问by Liangju Zeng

Spark version : 2.1

火花版本：2.1

For example, in pyspark, i create a list

例如，在 pyspark 中，我创建了一个列表

test_list = [['Hello', 'world'], ['I', 'am', 'fine']]

then how to create a dataframe form the test_list, where the dataframe's type is like below:

那么如何从test_list中创建一个数据框，其中数据框的类型如下：

DataFrame[words: array<string>]

Answer 1

回答by Pushkr

here is how -

这是如何 -

from pyspark.sql.types import *

cSchema = StructType([StructField("WordList", ArrayType(StringType()))])

# notice extra square brackets around each element of list 
test_list = [['Hello', 'world']], [['I', 'am', 'fine']]

df = spark.createDataFrame(test_list,schema=cSchema)

Answer 2

回答by Grant Shannon

i had to work with multiple columns and types - the example below has one string column and one integer column. A slight adjustment to Pushkr's code (above) gives:

我不得不处理多个列和类型——下面的例子有一个字符串列和一个整数列。对 Pushkr 的代码（上图）稍作调整，结果如下：

from pyspark.sql.types import *

cSchema = StructType([StructField("Words", StringType())\
                      ,StructField("total", IntegerType())])

test_list = [['Hello', 1], ['I am fine', 3]]

df = spark.createDataFrame(test_list,schema=cSchema)

output:

输出：

 df.show()
 +---------+-----+
|    Words|total|
+---------+-----+
|    Hello|    1|
|I am fine|    3|
+---------+-----+

Answer 3

回答by hamza tuna

You should use list of Row objects([Row]) to create data frame.

您应该使用 Row 对象列表（[Row]）来创建数据框。

from pyspark.sql import Row

spark.createDataFrame(list(map(lambda x: Row(words=x), test_list)))

Answer 4

回答by Raju Bairishetti

   You can create a RDD first from the input and then convert to dataframe from the constructed RDD
   <code>  
     import sqlContext.implicits._
       val testList = Array(Array("Hello", "world"), Array("I", "am", "fine"))
       // CREATE RDD
       val testListRDD = sc.parallelize(testList)
     val flatTestListRDD = testListRDD.flatMap(entry => entry)
     // COnvert RDD to DF 
     val testListDF = flatTestListRDD.toDF
     testListDF.show
    </code>

Python 如何从 Spark SQL 中的列表创建数据框？

提问by Liangju Zeng

回答by Pushkr

回答by Grant Shannon

回答by hamza tuna

回答by Raju Bairishetti

相关推荐

最近更新

标签

Python 如何从 Spark SQL 中的列表创建数据框？

提问by Liangju Zeng

回答by Pushkr

回答by Grant Shannon

回答by hamza tuna

回答by Raju Bairishetti

相关推荐

Python 类型错误：“list”和“int”的实例之间不支持“>=”

Anaconda：即使安装了 opencv，也无法导入 cv2（如何为 python3 安装 opencv3）

Python double_scalars 中遇到的溢出除了被零除还有哪些原因？

字典是否在 Python 3.6+ 中排序？

相关推荐

最近更新

标签