Python 从 Apache Spark 中的模式获取数据类型列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37335307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get list of data types from schema in Apache Spark
提问by User2130
I have the following code in Spark-Python to get the list of names from the schema of a DataFrame, which works fine, but how can I get the list of the data types?
我在 Spark-Python 中有以下代码可以从 DataFrame 的模式中获取名称列表,这很好用,但是如何获取数据类型列表?
columnNames = df.schema.names
For example, something like:
例如,类似于:
columnTypes = df.schema.types
Is there any way to get a separate list of the data types contained in a DataFrame schema?
有没有办法获得 DataFrame 模式中包含的数据类型的单独列表?
回答by Daniel de Paula
Here's a suggestion:
这是一个建议:
df = sqlContext.createDataFrame([('a', 1)])
types = [f.dataType for f in df.schema.fields]
types
> [StringType, LongType]
Reference:
参考:
回答by Viacheslav Shalamov
Since the question title is not python-specific, I'll add scala
version here:
由于问题标题不是特定于 python 的,我将在scala
此处添加版本:
val types = df.schema.fields.map(f => f.dataType)
It will result in an array of org.apache.spark.sql.types.DataType
.
它将导致org.apache.spark.sql.types.DataType
.
回答by stack0114106
Use schema.dtypes
使用 schema.dtypes
scala> val df = Seq(("ABC",10,20.4)).toDF("a","b","c")
df: org.apache.spark.sql.DataFrame = [a: string, b: int ... 1 more field]
scala>
scala> df.printSchema
root
|-- a: string (nullable = true)
|-- b: integer (nullable = false)
|-- c: double (nullable = false)
scala> df.dtypes
res2: Array[(String, String)] = Array((a,StringType), (b,IntegerType), (c,DoubleType))
scala> df.dtypes.map(_._2).toSet
res3: scala.collection.immutable.Set[String] = Set(StringType, IntegerType, DoubleType)
scala>