scala org.apache.spark.sql.AnalysisException:无法解析给定的输入列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43875245/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
org.apache.spark.sql.AnalysisException: cannot resolve given input columns
提问by ozzieisaacs
exitTotalDF
.filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015")
.groupBy("exiturl")
.agg(first("accid"), first("segment"), $"exiturl", sum("session"), sum("sessionfirst"), first("date"))
.orderBy(desc("session"))
.take(500)
org.apache.spark.sql.AnalysisException: cannot resolve '`session`' given input columns: [first(accid, false), first(date, false), sum(session), exiturl, sum(sessionfirst), first(segment, false)]
Its like the sum function cannot find the column names properly.
就像 sum 函数无法正确找到列名一样。
Using Spark 2.1
使用 Spark 2.1
回答by Derek_M
Typically in scenarios like this, I'll use the asmethod on the column. For example .agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date")). This gives you more control on what to expect, and if the summation name were to ever change in future versions of spark, you will have less of a headache updating all of the names in your dataset.
通常在这种情况下,我将使用as列上的方法。例如.agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date"))。这使您可以更好地控制预期内容,并且如果求和名称在未来版本的 spark 中发生变化,您将不必再为更新数据集中的所有名称而头疼。
Also, I just ran a simple test. When you don't specify the name, it looks like the name in Spark 2.1 gets changed to "sum(session)". One way to find this yourself is to call printSchema on the dataset.
另外,我只是进行了一个简单的测试。当您不指定名称时,Spark 2.1 中的名称看起来像是更改为“sum(session)”。自己找到它的一种方法是在数据集上调用 printSchema。
回答by Sruthi Poddutur
I prefer using withColumnRenamed()instead of as()because:
我更喜欢使用withColumnRenamed()而不是as()因为:
With as()one has to list all the columns he needs like this:
随着as()人们必须列出所有他需要这样的列:
df.select(first("accid"),
first("segment"),
$"exiturl",
col('sum("session")').as("session"),
sum("sessionfirst"),
first("date"))
VS withColumnRenamedis one liner:
VSwithColumnRenamed是一个班轮:
df1 = df.withColumnRenamed('sum("session")', "session")
Output df1will have all the columns that df has except that sum("session") column is now renamed to "session"
输出df1将包含 df 的所有列,除了 sum("session") 列现在重命名为 "session"

