scala org.apache.spark.sql.AnalysisException：无法解析给定的输入列

Question

提问by ozzieisaacs

exitTotalDF
  .filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015")
  .groupBy("exiturl")
  .agg(first("accid"), first("segment"), $"exiturl", sum("session"), sum("sessionfirst"), first("date"))
  .orderBy(desc("session"))
  .take(500)

org.apache.spark.sql.AnalysisException: cannot resolve '`session`' given input columns: [first(accid, false), first(date, false),  sum(session), exiturl, sum(sessionfirst), first(segment, false)]

Its like the sum function cannot find the column names properly.

就像 sum 函数无法正确找到列名一样。

Using Spark 2.1

使用 Spark 2.1

Answer 1

回答by Derek_M

Typically in scenarios like this, I'll use the asmethod on the column. For example .agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date")). This gives you more control on what to expect, and if the summation name were to ever change in future versions of spark, you will have less of a headache updating all of the names in your dataset.

通常在这种情况下，我将使用as列上的方法。例如.agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date"))。这使您可以更好地控制预期内容，并且如果求和名称在未来版本的 spark 中发生变化，您将不必再为更新数据集中的所有名称而头疼。

Also, I just ran a simple test. When you don't specify the name, it looks like the name in Spark 2.1 gets changed to "sum(session)". One way to find this yourself is to call printSchema on the dataset.

另外，我只是进行了一个简单的测试。当您不指定名称时，Spark 2.1 中的名称看起来像是更改为“sum(session)”。自己找到它的一种方法是在数据集上调用 printSchema。

Answer 2

回答by Sruthi Poddutur

I prefer using withColumnRenamed()instead of as()because:

我更喜欢使用withColumnRenamed()而不是as()因为：

With as()one has to list all the columns he needs like this:

随着as()人们必须列出所有他需要这样的列：

    df.select(first("accid"), 
          first("segment"),
          $"exiturl", 
          col('sum("session")').as("session"),
          sum("sessionfirst"),
          first("date"))

VS withColumnRenamedis one liner:

VSwithColumnRenamed是一个班轮：

    df1 = df.withColumnRenamed('sum("session")', "session")

Output df1will have all the columns that df has except that sum("session") column is now renamed to "session"

输出df1将包含 df 的所有列，除了 sum("session") 列现在重命名为 "session"

scala org.apache.spark.sql.AnalysisException：无法解析给定的输入列

提问by ozzieisaacs

回答by Derek_M

回答by Sruthi Poddutur

相关推荐

最近更新

标签

scala org.apache.spark.sql.AnalysisException：无法解析给定的输入列

提问by ozzieisaacs

回答by Derek_M

回答by Sruthi Poddutur

相关推荐

在 Spark 2.x 中使用 Scala 2.12

Spark Scala 如何在 RDD 中使用替换功能

scala 用于 StructType / Row 的 Spark UDF

scala 如何从现有的 SparkContext 创建 SparkSession

相关推荐

最近更新

标签