scala spark DataFrame "as" 方法的使用

Question

提问by Prikso NAI

I am looking at spark.sql.DataFramedocumentation.

我正在查看spark.sql.DataFrame文档。

There is

有

def as(alias: String): DataFrame
    Returns a new DataFrame with an alias set.
    Since
        1.3.0

What is the purpose of this method? How is it used? Can there be an example?

这种方法的目的是什么？它是如何使用的？可以举个例子吗？

I have not managed to find anything about this method online and the documentation is pretty non-existent. I have not managed to make any kind of alias using this method.

我还没有在网上找到任何关于这种方法的信息，而且文档也很不存在。我没有设法使用这种方法制作任何类型的别名。

Answer 1

回答by zero323

Spark <= 1.5

火花 <= 1.5

It is more or less equivalent to SQL table aliases:

它或多或少相当于 SQL 表别名：

SELECT *
FROM table AS alias;

Example usage adapted from PySpark aliasdocumentation:

改编自 PySparkalias文档的示例用法：

import org.apache.spark.sql.functions.col
case class Person(name: String, age: Int)

val df = sqlContext.createDataFrame(
    Person("Alice", 2) :: Person("Bob", 5) :: Nil)

val df_as1 = df.as("df1")
val df_as2 = df.as("df2")
val joined_df = df_as1.join(
    df_as2, col("df1.name") === col("df2.name"), "inner")
joined_df.select(
    col("df1.name"), col("df2.name"), col("df2.age")).show

Output:

输出：

+-----+-----+---+
| name| name|age|
+-----+-----+---+
|Alice|Alice|  2|
|  Bob|  Bob|  5|
+-----+-----+---+

Same thing using SQL query:

同样的事情使用 SQL 查询：

df.registerTempTable("df")
sqlContext.sql("""SELECT df1.name, df2.name, df2.age
                  FROM df AS df1 JOIN df AS df2
                  ON df1.name == df2.name""")

What is the purpose of this method?

这种方法的目的是什么？

Pretty much avoiding ambiguous column references.

几乎避免了歧义的列引用。

Spark 1.6+

火花 1.6+

There is also a new as[U](implicit arg0: Encoder[U]): Dataset[U]which is used to convert a DataFrameto a DataSetof a given type. For example:

还有一个 newas[U](implicit arg0: Encoder[U]): Dataset[U]用于将 a 转换为DataFrame给DataSet定类型的 a 。例如：

df.as[Person]

scala spark DataFrame "as" 方法的使用

提问by Prikso NAI

回答by zero323

相关推荐

最近更新

标签

scala spark DataFrame "as" 方法的使用

提问by Prikso NAI

回答by zero323

相关推荐

scala S3 目录上的 Spark Streaming

scala 如何定义DataFrame的分区？

scala 在 Spark 中将多个小文件合并为几个大文件

scala 找不到密钥“akka.version”的配置设置

相关推荐

最近更新

标签