Scala Spark DataFrame:dataFrame.select 多列给定列名序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36131716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names
提问by Himaprasoon
val columnName=Seq("col1","col2",....."coln");
Is there a way to do dataframe.select operation to get dataframe containing only the column names specified .
I know I can do dataframe.select("col1","col2"...)but the columnNameis generated at runtime.
I could do dataframe.select()repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?
有没有办法做 dataframe.select 操作来获取只包含指定列名的数据框。我知道我可以做,dataframe.select("col1","col2"...)但它columnName是在运行时生成的。我可以dataframe.select()对循环中的每个列名重复执行。它会有任何性能开销吗?。有没有其他更简单的方法来实现这一点?
回答by Tzach Zohar
val columnNames = Seq("col1","col2",....."coln")
// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)
// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)
回答by UserszrKs
Since dataFrame.select()expects a sequence of columns and we have a sequence of strings, we need to convert our sequence to a Listof cols and convert that list to the sequence. columnName.map(name => col(name)): _*gives a sequence of columns from a sequence of strings, and this can be passed as a parameter to select():
由于dataFrame.select()需要一个列序列并且我们有一个字符串序列,我们需要将我们的序列转换为 aList的cols 并将该列表转换为序列。columnName.map(name => col(name)): _*给出一系列字符串中的一系列列,这可以作为参数传递给select():
val columnName = Seq("col1", "col2")
val DFFiltered = DF.select(columnName.map(name => col(name)): _*)

