scala spark DataFrame "as" 方法的使用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31537420/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:22:57  来源:igfitidea点击:

Usage of spark DataFrame "as" method

scalaapache-sparkdataframeapache-spark-sql

提问by Prikso NAI

I am looking at spark.sql.DataFramedocumentation.

我正在查看spark.sql.DataFrame文档。

There is

def as(alias: String): DataFrame
    Returns a new DataFrame with an alias set.
    Since
        1.3.0 

What is the purpose of this method? How is it used? Can there be an example?

这种方法的目的是什么?它是如何使用的?可以举个例子吗?

I have not managed to find anything about this method online and the documentation is pretty non-existent. I have not managed to make any kind of alias using this method.

我还没有在网上找到任何关于这种方法的信息,而且文档也很不存在。我没有设法使用这种方法制作任何类型的别名。

回答by zero323

Spark <= 1.5

火花 <= 1.5

It is more or less equivalent to SQL table aliases:

它或多或少相当于 SQL 表别名:

SELECT *
FROM table AS alias;

Example usage adapted from PySpark aliasdocumentation:

改编自 PySparkalias文档的示例用法:

import org.apache.spark.sql.functions.col
case class Person(name: String, age: Int)

val df = sqlContext.createDataFrame(
    Person("Alice", 2) :: Person("Bob", 5) :: Nil)

val df_as1 = df.as("df1")
val df_as2 = df.as("df2")
val joined_df = df_as1.join(
    df_as2, col("df1.name") === col("df2.name"), "inner")
joined_df.select(
    col("df1.name"), col("df2.name"), col("df2.age")).show

Output:

输出:

+-----+-----+---+
| name| name|age|
+-----+-----+---+
|Alice|Alice|  2|
|  Bob|  Bob|  5|
+-----+-----+---+

Same thing using SQL query:

同样的事情使用 SQL 查询:

df.registerTempTable("df")
sqlContext.sql("""SELECT df1.name, df2.name, df2.age
                  FROM df AS df1 JOIN df AS df2
                  ON df1.name == df2.name""")

What is the purpose of this method?

这种方法的目的是什么?

Pretty much avoiding ambiguous column references.

几乎避免了歧义的列引用。

Spark 1.6+

火花 1.6+

There is also a new as[U](implicit arg0: Encoder[U]): Dataset[U]which is used to convert a DataFrameto a DataSetof a given type. For example:

还有一个 newas[U](implicit arg0: Encoder[U]): Dataset[U]用于将 a 转换为DataFrameDataSet定类型的 a 。例如:

df.as[Person]